* 4 new video models * Multiple image models, including SANA & Flux Control * New quantizers -> GGUF & TorchAO * New training scripts Enjoy this holiday-special Diffusers release ๐ค Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0
Introducing ๐๐ ๐ข๐ง๐๐๐๐ญ๐ก: the best public math pre-training dataset with 50B+ tokens! HuggingFaceTB/finemath
Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.
We build the dataset by: ๐ ๏ธ carefully extracting math data from Common Crawl; ๐ iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.
We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.
We hope this helps advance the performance of LLMs on math and reasoning! ๐ Weโre also releasing all the ablation models as well as the evaluation code.
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser! ๐ Faster and more accurate than Whisper ๐ Privacy-focused (no data leaves your device) โก๏ธ WebGPU accelerated (w/ WASM fallback) ๐ฅ Powered by ONNX Runtime Web and Transformers.js
In the past seven days, the Diffusers team has shipped:
1. Two new video models 2. One new image model 3. Two new quantization backends 4. Three new fine-tuning scripts 5. Multiple fixes and library QoL improvements
Coffee on me if someone can guess 1 - 4 correctly.
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute ๐ฅ
How? By combining step-wise reward models with tree search algorithms :)
We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"
We're open sourcing the full recipe and sharing a detailed blog post.
In our blog post we cover:
๐ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.
๐ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.
๐งญ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM
Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !
3x more tokens.
By reducing our memory footprint, weโre able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments. 13x faster
On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Daniรซl de Kok for the beast data structure. Zero config
Thatโs it. Remove all the flags your are using and youโre likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we donโt have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! ๐ฅ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. ๐ค Try it out yourself!
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! ๐คฏ Let's take a look: ๐ Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text) ๐๏ธ Qwen2-VL from Qwen for dynamic-resolution image understanding ๐ข JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings ๐ LLaVA-OneVision from ByteDance for Image-Text-to-Text generation ๐คธโโ๏ธ ViTPose for pose estimation ๐ MGP-STR for optical character recognition (OCR) ๐ PatchTST & PatchTSMixer for time series forecasting
That's right, everything running 100% locally in your browser (no data sent to a server)! ๐ฅ Huge for privacy!
๐จ How green is your model? ๐ฑ Introducing a new feature in the Comparator tool: Environmental Impact for responsible #LLM research! ๐ open-llm-leaderboard/comparator Now, you can not only compare models by performance, but also by their environmental footprint!
๐ The Comparator calculates COโ emissions during evaluation and shows key model characteristics: evaluation score, number of parameters, architecture, precision, type... ๐ ๏ธ Make informed decisions about your model's impact on the planet and join the movement towards greener AI!
Have you tried out ๐ค Transformers.js v3? Here are the new features: โก WebGPU support (up to 100x faster than WASM) ๐ข New quantization formats (dtypes) ๐ 120 supported architectures in total ๐ 25 new example projects and templates ๐ค Over 1200 pre-converted models ๐ Node.js (ESM + CJS), Deno, and Bun compatibility ๐ก A new home on GitHub and NPM