Need4Speed

company

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

lvwerra authored a paper 21 days ago

Towards Best Practices for Open Datasets for LLM Training

Haihao authored a paper 2 months ago

A dynamic parallel method for performance optimization on hybrid CPUs

ofirzaf authored a paper 3 months ago

Q8BERT: Quantized 8Bit BERT

View all activity

need-for-speed's activity

lvwerra

authored a paper 21 days ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 22 days ago • 53

wenhuach

posted an update about 1 month ago

Post

2330

Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc

3 replies

wenhuach

posted an update about 2 months ago

Post

1816

AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.

4 replies

wenhuach

posted an update 2 months ago

Post

341

This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!

https://huggingface.co/OPEA

3 replies

Haihao

authored a paper 2 months ago

A dynamic parallel method for performance optimization on hybrid CPUs

Paper • 2411.19542 • Published Nov 29, 2024 • 5

wenhuach

posted an update 2 months ago

Post

984

OPEA space just releases nearly 20 int4 models, for example, QWQ-32B-Preview,
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out https://huggingface.co/OPEA

loubnabnl

posted an update 2 months ago

Post

2217

Making SmolLM2 reproducible: open-sourcing our training & evaluation toolkit 🛠️ https://github.com/huggingface/smollm/

- Pre-training code with nanotron
- Evaluation suite with lighteval
- Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk)
- Post-training scripts with TRL & the alignment handbook
- On-device tools with llama.cpp for summarization, rewriting & agents

Apache 2.0 licensed. V2 pre-training data mix coming soon!

Which other tools should we add next?

ofirzaf

authored 2 papers 3 months ago

Q8BERT: Quantized 8Bit BERT

Paper • 1910.06188 • Published Oct 14, 2019 • 2

FastDraft: How to Train Your Draft

Paper • 2411.11055 • Published Nov 17, 2024 • 10

orenpereg

authored 5 papers 3 months ago

Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow

Paper • 1807.10104 • Published Jul 26, 2018 • 1

ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System

Paper • 1909.05608 • Published Sep 12, 2019

Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

Paper • 2210.10144 • Published Oct 18, 2022

Efficient Few-Shot Learning Without Prompts

Paper • 2209.11055 • Published Sep 22, 2022 • 3

Accelerating Speculative Decoding using Dynamic Speculation Length

Paper • 2405.04304 • Published May 7, 2024 • 2

lvwerra

authored a paper 3 months ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 23

Haihao

authored 3 papers 4 months ago

moshew

authored a paper 6 months ago

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Paper • 2408.02545 • Published Aug 5, 2024 • 36

wenhuach

posted an update 6 months ago

Post

652

Try to find a better int4 algorithm for LLAMA3.1? For the 8B model, AutoRound boasts an average improvement across 10 zero-shot tasks, scoring 63.93 versus 63.15 (AWQ). Notably, on the MMLU task, it achieved 66.72 compared to 65.25, and on the ARC-C task, it scored 52.13 against 50.94. For further details and comparisons, visit the leaderboard at Intel/low_bit_open_llm_leaderboard.

AI & ML interests

Recent Activity

Team members 20

need-for-speed's activity