Qwen2.5-Omni is soooo good that people build multimodal reasoning models off of it 🥹 > KE-Team/Ke-Omni-R-3B is open-source audio reasoning model sota on average of benchmarks, based on Qwen/Qwen2.5-Omni-3B 🗣️ > Haoz0206/Omni-R1 is a video reasoning model with pixel level grounding (see below) and it's super competitive ⏯️ based on Qwen/Qwen2.5-Omni-7B
NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯
🔐 Privacy by design (no data leaves your device) 💰 Completely free... forever 📦 Zero installation required, just visit a website ⚡️ Blazingly-fast WebGPU-accelerated inference
For those interested, here's how it works: - Silero VAD for voice activity detection - Whisper for speech recognition - SmolLM2-1.7B for text generation - Kokoro for text to speech
Powered by Transformers.js and ONNX Runtime Web! 🤗 I hope you like it!
We have been working on a project called kernels. kernels makes it possible to load compute kernels directly from the Hub! 🚀
We plan to give kernels a more proper introduction soon. But for those who have been following along, we are happy to announce a new release:
- New layer API with torch.compile support. - Experimental support for loading Apple Silicon Metal 🤘 Kernels. - Generate wheels from Hub kernels for legacy deployments.
If you didn't yet, you should read the technical report for SmolVLA, published yesterday by the Hugging Face robotics team! ➡️ Amongst other ideas, it introduces "Async inference" to boost their robot actions.
Robots have a problem: performing the actions takes time (Unlike agents where action executions are near-instant!) Most often, robots wait until they've finished performing actions to start thinking about hte next steps. This is a huge latency cost!
So the team decided to have the PolicyServer (aka the"thinking" part) restart early : instead of waiting for the n observations they just sent to be completed, they gather the observations after k < n steps, and start preparing the next actions based on that while the steps are running until n, to directly send their next steps.
➡️ This boosted robot throughput by ~30%! (nearly 2× tasks per time window).
This is the story of how open source AI created a $3M business for a news company:
Clare Spencer tells on the GAIN blog how a Danish software engineer found OpenAI's Whisper model and turned it into Good Tape. It's now generating $3M ARR for news service Zetland.
Great playbook on how to build a good product: - This idea came from a software engineer, Jakob Steinn, who was not only able to spot a new model, but also listen to feedback from his colleagues in the newsrooms (he thought they would use it for translation, but they were more interested in transcription in Danish) - They built iteratively: they went from running the model in the terminal to a notebook to a full-fledged web interface - They didn't just wrap the API. They rebuilt the transcription engine from scratch, moved it to TPUs for 45-second processing of hour-long audio, and added EU-based data sovereignty
Now Good Tape has 2.5M users worldwide, with only 30-35% being journalists. Small languages (Danish, Finnish, Croatian, Hebrew) were underserved by existing tools - suddenly there's a "very very big market" when you put them together.
This shows how open source AI can solve real workflow problems and create sustainable businesses. Sometimes the best opportunities emerge from solving your own daily problems.