Humans often solve visual problems by sketching ideas in our minds. What if Vision-Language Models (VLMs) could do something similar, not by generating full images, but by using internal “mental sketches”?
That’s the idea behind Mirage, a new framework that empowers VLMs to reason using latent visual tokens. Instead of just thinking in words, Mirage mixes in abstract visual representations that help the model solve complex tasks.
These aren't photorealistic images. They're compact, internal representations optimized purely to support reasoning.
🔧 Mirage is trained in two phases:
1) Grounding: It learns to produce latent tokens anchored in real images. 2) Refinement: The model drops the images and learns to generate visual tokens on its own.
📈 And yes, it works! On challenging benchmarks like Visual Spatial Planning, Jigsaw puzzles, and Spatial Attention Tasks, Mirage clearly outperforms GPT-4o and other strong baselines. Smart sketches > empty words.
What's next? We would love to hear from the community! What sparse encoder models would you like to see? And what new capabilities should Sentence Transformers handle - multimodal embeddings, late interaction models, or something else? Your feedback shapes our roadmap!
I'm incredibly excited to see the community explore sparse embeddings and hybrid search! The interpretability alone makes this a game-changer for understanding what your models are actually doing.
🙏 Thanks to @tomaarsen for this incredible opportunity!
‼️Sentence Transformers v5.0 is out! The biggest update yet introduces Sparse Embedding models, encode methods improvements, Router module for asymmetric models & much more. Sparse + Dense = 🔥 hybrid search performance! Details:
1️⃣ Sparse Encoder Models Brand new support for sparse embedding models that generate high-dimensional embeddings (30,000+ dims) where <1% are non-zero:
- Full SPLADE, Inference-free SPLADE, and CSR architecture support - 4 new modules, 12 new losses, 9 new evaluators - Integration with @elastic-co, @opensearch-project, @NAVER LABS Europe, @qdrant, @IBM, etc. - Decode interpretable embeddings to understand token importance - Hybrid search integration to get the best of both worlds
2️⃣ Enhanced Encode Methods & Multi-Processing - Introduce encode_query & encode_document automatically use predefined prompts - No more manual pool management - just pass device list directly to encode() - Much cleaner and easier to use than the old multi-process approach
3️⃣ Router Module & Advanced Training - Router module with different processing paths for queries vs documents - Custom learning rates for different parameter groups - Composite loss logging - see individual loss components - Perfect for two-tower architectures
4️⃣ Comprehensive Documentation & Training - New Training Overview, Loss Overview, API Reference docs - 6 new training example documentation pages - Full integration examples with major search engines - Extensive blogpost on training sparse models
What's next? We would love to hear from the community! What sparse encoder models would you like to see? And what new capabilities should Sentence Transformers handle - multimodal embeddings, late interaction models, or something else? Your feedback shapes our roadmap!
Inference for generative ai models looks like a mine field, but there’s a simple protocol for picking the best inference:
🌍 95% of users >> If you’re using open (large) models and need fast online inference, then use Inference providers on auto mode, and let it choose the best provider for the model. https://huggingface.co/docs/inference-providers/index
👷 fine-tuners/ bespoke >> If you’ve got custom setups, use Inference Endpoints to define a configuration from AWS, Azure, GCP. https://endpoints.huggingface.co/
In case you missed it, Hugging Face expanded its collaboration with Azure a few weeks ago with a curated catalog of 10,000 models, accessible from Azure AI Foundry and Azure ML!
@alvarobartt cooked during these last days to prepare the one and only documentation you need, if you wanted to deploy Hugging Face models on Azure. It comes with an FAQ, great guides and examples on how to deploy VLMs, LLMs, smolagents and more to come very soon.
We need your feedback: come help us and let us know what else you want to see, which model we should add to the collection, which model task we should prioritize adding, what else we should build a tutorial for. You’re just an issue away on our GitHub repo!
Hugging Face just wrapped 4 months of deep work with AMD to push kernel-level optimization on their MI300X GPUs. Now, it's time to share everything we learned.
Join us in Paris at STATION F for a hands-on weekend of workshops and a hackathon focused on making open-source LLMs faster and more efficient on AMD.
Prizes, amazing host speakers, ... if you want more details, navigate to https://lu.ma/fmvdjmur!
🚀 SmolAgents v1.19.0 is live! This release brings major improvements to agent flexibility, UI usability, streaming architecture, and developer experience: making it easier than ever to build smart, interactive AI agents. Here's what's new:
🔧 Agent Upgrades - Support for managed agents in ToolCallingAgent - Context manager support for cleaner agent lifecycle handling - Output formatting now uses XML tags for consistency
🖥️ UI Enhancements - GradioUI now supports reset_agent_memory: perfect for fresh starts in dev & demos.
🔄 Streaming Refactor - Streaming event aggregation moved off the Model class - ➡️ Better architecture & maintainability
📦 Output Tracking - CodeAgent outputs are now stored in ActionStep - ✅ More visibility and structure to agent decisions
🐛 Bug Fixes - Smarter planning logic - Cleaner Docker logs - Better prompt formatting for additional_args - Safer internal functions and final answer matching
📚 Docs Improvements - Added quickstart examples with tool usage - One-click Colab launch buttons - Expanded reference docs (AgentMemory, GradioUI docstrings) - Fixed broken links and migrated to .md format