Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯 🗣️ Transcribe videos, meeting notes, songs and more 🔐 Runs on-device, meaning no data is sent to a server 🌎 Multilingual (8 languages) 🤗 Completely free (forever) & open source
That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥
NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯
🔐 Privacy by design (no data leaves your device) 💰 Completely free... forever 📦 Zero installation required, just visit a website ⚡️ Blazingly-fast WebGPU-accelerated inference
For those interested, here's how it works: - Silero VAD for voice activity detection - Whisper for speech recognition - SmolLM2-1.7B for text generation - Kokoro for text to speech
Powered by Transformers.js and ONNX Runtime Web! 🤗 I hope you like it!
hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! 💥
as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!
in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.
Always surprised that so few people actually read the FineTasks blog, on ✨how to select training evals with the highest signal✨
If you're serious about training models without wasting compute on shitty runs, you absolutely should read it!!
An high signal eval actually tells you precisely, during training, how wel & what your model is learning, allowing you to discard the bad runs/bad samplings/...!
The blog covers in depth prompt choice, metrics, dataset, across languages/capabilities, and my fave section is "which properties should evals have"👌 (to know on your use case how to select the best evals for you)
Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. 🤯 A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! 🤗
Gemini 2.5 Flash is here! We excited launch our first hybrid reasoning Gemini model. In Flash 2.5 developer can turn thinking off.
**TL;DR:** - 🧠 Controllable "Thinking" with thinking budget with up to 24k token - 🌌 1 Million multimodal input context for text, image, video, audio, and pdf - 🛠️ Function calling, structured output, google search & code execution. - 🏦 $0.15 1M input tokens; $0.6 or $3.5 (thinking on) per million output tokens (thinking tokens are billed as output tokens) - 💡 Knowledge cut of January 2025 - 🚀 Rate limits - Free 10 RPM 500 req/day - 🏅Outperforms 2.0 Flash on every benchmark
Reasoning models like o3 and o4-mini are advancing faster than ever, but imagine what will be possible when they can run locally in your browser! 🤯
Well, with 🤗 Transformers.js, you can do just that! Here's Zyphra's new ZR1 model running at over 100 tokens/second on WebGPU! ⚡️
Giving models access to browser APIs (like File System, Screen Capture, and more) could unlock an entirely new class of web experiences that are personalized, interactive, and run locally in a secure, sandboxed environment.
Gemini 2.5 Pro, thinking by default! We excited launch our best Gemini model for reasoning, multimodal and coding yet! #1 on LMSYS, Humanity’s Last Exam, AIME and GPQA and more!
TL;DR: - 💻 Best Gemini coding model yet, particularly for web development (excels on LiveCodeBench). - 🧠 Default "Thinking" with up to 64k token output - 🌌 1 Million multimodal input context for text, image, video, audio, and pdf - 🛠️ Function calling, structured output, google search & code execution. - 🏆 #1 on LMArena & sota on AIME, GPQA, Humanity's Last Exam - 💡 Knowledge cut of January 2025 - 🤗 Available for free as Experimental in AI Studio, Gemini API & Gemini APP - 🚀 Rate limits - Free 2 RPM 50 req/day
Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.
Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**. (Which everybody does, but people usually don't say)
For a tech report, it makes a lot of sense to report model performance when used optimally! On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)
Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!
Because if your model knows its evals by heart, you're not testing for generalization.
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!
They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.
This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).
Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.
Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.