This is available today, in the open-source version of phospho. Still is 100% compatible with LeRobot.
The LeRobot dataset by HuggingFace and Remi Cadene is becoming a standard to create robotics datasets. But working with it can rapidly become a nightmare:
- you can't delete a faulty episode. Failed a demo? Finito. - you can't merge datasets - you can't split datasets
So we fixed it.
Now, in the dashboard or in Python, using phospho you can: - repair corrupted LeRobot datasets - delete episodes from a dataset - merge datasets - split datasets
Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!
> first reasoning model for robotics > based on Qwen 2.5-VL-7B, use with Hugging Face transformers or vLLM 🤗 > comes with SFT & alignment datasets and a new benchmark 👏
hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! 💥
as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!
in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.
LLMs 💬 > Alibaba Qwen released WorldPM-72B, new World Preference Model trained with 15M preference samples (OS) > II-Medical-8B, new LLM for medical reasoning that comes in 8B by Intelligent-Internet > TRAIL is a new dataset by Patronus for trace error reasoning for agents (OS)
Multimodal 🖼️💬 > Salesforce Research released BLIP3o, a new any-to-any model with image-text input and image-text output 💬it’s based on an image encoder, a text decoder and a DiT, and comes in 8B > They also released pre-training and fine-tuning datasets > MMMG is a multimodal generation benchmark for image, audio, text (interleaved)
Image Generation ⏯️ > Alibaba Wan-AI released Wan2.1-VACE, video foundation model for image and text to video, video-to-audio and more tasks, comes in 1.3B and 14B (OS) > ZuluVision released MoviiGen1.1, new cinematic video generation model based on Wan 2.1 14B (OS) > multimodalart released isometric-skeumorphic-3d-bnb, an isometric 3D asset generator (like AirBnB assets) based on Flux > LTX-Video-0.9.7-distilled is a new real-time video generation (text and image to video) model by Lightricks > Hidream_t2i_human_preference is a new text-to-image preference dataset by Rapidata with 195k human responses from 38k annotators
Audio 🗣️ > stabilityai released stable-audio-open-small new text-to-audio model > TEN-framework released ten-vad, voice activity detection model (OS)
New in smolagents v1.16.0: 🔍 Bing support in WebSearchTool 🐍 Custom functions & executor_kwargs in LocalPythonExecutor 🔧 Streaming GradioUI fixes 🌐 Local web agents via api_base & api_key 📚 Better docs
Hey, I'll be presenting @retrain-pipelines and almighty function-calling at the Hugging Face Paris HQ, you guys. Monday evening. Lightning-talk style. With AI Tinkerers.
We just shipped a blog on everything latest on vision language models, including 🤖 GUI agents, agentic VLMs, omni models 📑 multimodal RAG ⏯️ video LMs 🤏🏻 smol models ..and more! https://huggingface.co/blog/vlms-2025
What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?
💬 Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B 🤯 as well as Qwen2.5-Omni, any-to-any model in 3B and 7B! > Microsoft AI released Phi4 reasoning models (that also come in mini and plus sizes) > NVIDIA released new CoT reasoning datasets 🖼️ > ByteDance released UI-TARS-1.5, native multimodal UI parsing agentic model > Meta released EdgeTAM, an on-device object tracking model (SAM2 variant) 🗣️ NVIDIA released parakeet-tdt-0.6b-v2, a smol 600M automatic speech recognition model > Nari released Dia, a 1.6B text-to-speech model > Moonshot AI released Kimi Audio, a new audio understanding, generation, conversation model 👩🏻💻 JetBrains released Melium models in base and SFT for coding > Tesslate released UIGEN-T2-7B, a new text-to-frontend-code model 🤩
you can easily fine-tune, quantize, play with sota vision LM InternVL3 now 🔥 we have recently merged InternVL3 to Hugging Face transformers and released converted checkpoints 🤗