Paris AI Running Club

community

AI & ML interests

None defined yet.

Recent Activity

paris-ai-running-club's activity

PLB 
posted an update about 19 hours ago
view post
Post
674
We fixed the LeRobot dataset format.

This is available today, in the open-source version of phospho. Still is 100% compatible with LeRobot.

The LeRobot dataset by HuggingFace and Remi Cadene is becoming a standard to create robotics datasets. But working with it can rapidly become a nightmare:

- you can't delete a faulty episode. Failed a demo? Finito.
- you can't merge datasets
- you can't split datasets

So we fixed it.

Now, in the dashboard or in Python, using phospho you can:
- repair corrupted LeRobot datasets
- delete episodes from a dataset
- merge datasets
- split datasets

Still is 100% compatible with LeRobot.
Available today, in the open-source version of phospho: https://github.com/phospho-app/phosphobot

Bigger datasets, better models!
merve 
posted an update 1 day ago
view post
Post
1935
Google released MedGemma on I/O'25 👏 google/medgemma-release-680aade845f90bec6a3f60c4

> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go 🤗

they also released a cool demo for scan reading ➡️ google/rad_explain

use with transformers ⤵️
  • 1 reply
·
merve 
posted an update 2 days ago
view post
Post
2505
Bu post'u çevirebilirsiniz 🤗💗
·
merve 
posted an update 2 days ago
view post
Post
2093
tis the year of any-to-any/omni models 🤠
ByteDance-Seed/BAGEL-7B-MoT 7B native multimodal model that understands and generates both image + text

it outperforms leading VLMs like Qwen 2.5-VL 👏 and has Apache 2.0 license 😱
clem 
posted an update 3 days ago
view post
Post
2556
Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!
·
Jofthomas 
posted an update 3 days ago
view post
Post
2214
Meet our new agentic model : 𝗗𝗲𝘃𝘀𝘁𝗿𝗮𝗹

Devstral is an open-source LLM built software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌.

𝗞𝗲𝘆 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 :
• 🤖 𝗔𝗴𝗲𝗻𝘁𝘀 : perfect for Agentic coding
• 🍃 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁: Devstral is a 𝟮𝟰𝗕 parameter based on Mistral small.
• ©️ 𝗔𝗽𝗮𝗰𝗵𝗲 𝟮.𝟬, meaning fully open-source !
• 📄 A 𝟭𝟮𝟴𝗸 context window.

📚Blog : https://mistral.ai/news/devstral
⚡API : The model is also available on our API under the name 𝗱𝗲𝘃𝘀𝘁𝗿𝗮𝗹-𝘀𝗺𝗮𝗹𝗹-𝟮𝟱𝟬𝟱
🤗 repo : mistralai/Devstral-Small-2505

Can't wait to see what you will build with it !
  • 1 reply
·
merve 
posted an update 4 days ago
view post
Post
1649
NVIDIA released new vision reasoning model for robotics: Cosmos-Reason1-7B 🤖 nvidia/cosmos-reason1-67c9e926206426008f1da1b7

> first reasoning model for robotics
> based on Qwen 2.5-VL-7B, use with Hugging Face transformers or vLLM 🤗
> comes with SFT & alignment datasets and a new benchmark 👏
reach-vb 
posted an update 5 days ago
view post
Post
2978
hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! 💥

as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!

in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.

p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage

p.p.s. this is fully backwards compatible so everything will work as it should! 🤗
·
merve 
posted an update 5 days ago
view post
Post
2515
It was the week of video generation at @huggingface , on top of many new LLMs, VLMs and more!
Let’s have a wrap 🌯 merve/may-16-releases-682aeed23b97eb0fe965345c

LLMs 💬
> Alibaba Qwen released WorldPM-72B, new World Preference Model trained with 15M preference samples (OS)
> II-Medical-8B, new LLM for medical reasoning that comes in 8B by Intelligent-Internet
> TRAIL is a new dataset by Patronus for trace error reasoning for agents (OS)

Multimodal 🖼️💬
> Salesforce Research released BLIP3o, a new any-to-any model with image-text input and image-text output 💬it’s based on an image encoder, a text decoder and a DiT, and comes in 8B
> They also released pre-training and fine-tuning datasets
> MMMG is a multimodal generation benchmark for image, audio, text (interleaved)

Image Generation ⏯️
> Alibaba Wan-AI released Wan2.1-VACE, video foundation model for image and text to video, video-to-audio and more tasks, comes in 1.3B and 14B (OS)
> ZuluVision released MoviiGen1.1, new cinematic video generation model based on Wan 2.1 14B (OS)
> multimodalart released isometric-skeumorphic-3d-bnb, an isometric 3D asset generator (like AirBnB assets) based on Flux
> LTX-Video-0.9.7-distilled is a new real-time video generation (text and image to video) model by Lightricks
> Hidream_t2i_human_preference is a new text-to-image preference dataset by Rapidata with 195k human responses from 38k annotators

Audio 🗣️
> stabilityai released stable-audio-open-small new text-to-audio model
> TEN-framework released ten-vad, voice activity detection model (OS)

merve 
posted an update 8 days ago
albertvillanova 
posted an update 8 days ago
Aurelien-Morgan 
posted an update 9 days ago
clem 
posted an update 10 days ago
view post
Post
3089
Very cool to see pytorch contributing on Hugging Face. Time to follow them to see what they're cooking!
  • 2 replies
·
merve 
posted an update 12 days ago
view post
Post
4997
VLMS 2025 UPDATE 🔥

We just shipped a blog on everything latest on vision language models, including
🤖 GUI agents, agentic VLMs, omni models
📑 multimodal RAG
⏯️ video LMs
🤏🏻 smol models
..and more! https://huggingface.co/blog/vlms-2025
  • 1 reply
·
clem 
posted an update 16 days ago
clem 
posted an update 18 days ago
view post
Post
4039
What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?
·
merve 
posted an update 18 days ago
view post
Post
5049
A ton of impactful models and datasets in open AI past week, let's summarize the best 🤩 merve/releases-apr-21-and-may-2-6819dcc84da4190620f448a3

💬 Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B 🤯 as well as Qwen2.5-Omni, any-to-any model in 3B and 7B!
> Microsoft AI released Phi4 reasoning models (that also come in mini and plus sizes)
> NVIDIA released new CoT reasoning datasets
🖼️ > ByteDance released UI-TARS-1.5, native multimodal UI parsing agentic model
> Meta released EdgeTAM, an on-device object tracking model (SAM2 variant)
🗣️ NVIDIA released parakeet-tdt-0.6b-v2, a smol 600M automatic speech recognition model
> Nari released Dia, a 1.6B text-to-speech model
> Moonshot AI released Kimi Audio, a new audio understanding, generation, conversation model
👩🏻‍💻 JetBrains released Melium models in base and SFT for coding
> Tesslate released UIGEN-T2-7B, a new text-to-frontend-code model 🤩
merve 
posted an update 19 days ago
view post
Post
6545
A real-time object detector much faster and accurate than YOLO with Apache 2.0 license just landed to Hugging Face transformers 🔥

D-FINE is the sota real-time object detector that runs on T4 (free Colab) 🤩

> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352

Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper 🎩

Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve 🥲☹️



D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate 🤩

Another core idea behind this model is Global Optimal Localization Self-Distillation ⤵️

this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.

  • 2 replies
·
merve 
posted an update 22 days ago
clem 
posted an update 22 days ago