paris-ai-running-club (Paris AI Running Club)

PLB

posted an update about 19 hours ago

Post

674

We fixed the LeRobot dataset format.

This is available today, in the open-source version of phospho. Still is 100% compatible with LeRobot.

The LeRobot dataset by HuggingFace and Remi Cadene is becoming a standard to create robotics datasets. But working with it can rapidly become a nightmare:

- you can't delete a faulty episode. Failed a demo? Finito.
- you can't merge datasets
- you can't split datasets

So we fixed it.

Now, in the dashboard or in Python, using phospho you can:
- repair corrupted LeRobot datasets
- delete episodes from a dataset
- merge datasets
- split datasets

Still is 100% compatible with LeRobot.
Available today, in the open-source version of phospho: https://github.com/phospho-app/phosphobot

Bigger datasets, better models!

merve

posted an update 1 day ago

Post

1935

Google released MedGemma on I/O'25 👏 google/medgemma-release-680aade845f90bec6a3f60c4

> 4B and 27B instruction fine-tuned vision LMs and a 4B pre-trained vision LM for medicine
> available with transformers from the get-go 🤗

they also released a cool demo for scan reading ➡️ google/rad_explain

use with transformers ⤵️

1 reply

·

merve

posted an update 2 days ago

Post

2505

Bu post'u çevirebilirsiniz 🤗💗

6 replies

·

merve

posted an update 2 days ago

Post

2093

tis the year of any-to-any/omni models 🤠
ByteDance-Seed/BAGEL-7B-MoT 7B native multimodal model that understands and generates both image + text

it outperforms leading VLMs like Qwen 2.5-VL 👏 and has Apache 2.0 license 😱

clem

posted an update 3 days ago

Post

2556

Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!

15 replies

·

Jofthomas

posted an update 3 days ago

Post

2214

Meet our new agentic model : 𝗗𝗲𝘃𝘀𝘁𝗿𝗮𝗹

Devstral is an open-source LLM built software engineering tasks built under a collaboration between Mistral AI and All Hands AI 🙌.

𝗞𝗲𝘆 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 :
• 🤖 𝗔𝗴𝗲𝗻𝘁𝘀 : perfect for Agentic coding
• 🍃 𝗹𝗶𝗴𝗵𝘁𝘄𝗲𝗶𝗴𝗵𝘁: Devstral is a 𝟮𝟰𝗕 parameter based on Mistral small.
• ©️ 𝗔𝗽𝗮𝗰𝗵𝗲 𝟮.𝟬, meaning fully open-source !
• 📄 A 𝟭𝟮𝟴𝗸 context window.

📚Blog : https://mistral.ai/news/devstral
⚡API : The model is also available on our API under the name 𝗱𝗲𝘃𝘀𝘁𝗿𝗮𝗹-𝘀𝗺𝗮𝗹𝗹-𝟮𝟱𝟬𝟱
🤗 repo : mistralai/Devstral-Small-2505

Can't wait to see what you will build with it !

1 reply

·

merve

posted an update 4 days ago

Post

1649

NVIDIA released new vision reasoning model for robotics: Cosmos-Reason1-7B 🤖 nvidia/cosmos-reason1-67c9e926206426008f1da1b7

> first reasoning model for robotics
> based on Qwen 2.5-VL-7B, use with Hugging Face transformers or vLLM 🤗
> comes with SFT & alignment datasets and a new benchmark 👏

reach-vb

posted an update 5 days ago

Post

2978

hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! 💥

as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!

in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.

p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage

p.p.s. this is fully backwards compatible so everything will work as it should! 🤗

10 replies

·

merve

posted an update 5 days ago

Post

2515

It was the week of video generation at @huggingface , on top of many new LLMs, VLMs and more!
Let’s have a wrap 🌯 merve/may-16-releases-682aeed23b97eb0fe965345c

LLMs 💬
> Alibaba Qwen released WorldPM-72B, new World Preference Model trained with 15M preference samples (OS)
> II-Medical-8B, new LLM for medical reasoning that comes in 8B by Intelligent-Internet
> TRAIL is a new dataset by Patronus for trace error reasoning for agents (OS)

Multimodal 🖼️💬
> Salesforce Research released BLIP3o, a new any-to-any model with image-text input and image-text output 💬it’s based on an image encoder, a text decoder and a DiT, and comes in 8B
> They also released pre-training and fine-tuning datasets
> MMMG is a multimodal generation benchmark for image, audio, text (interleaved)

Image Generation ⏯️
> Alibaba Wan-AI released Wan2.1-VACE, video foundation model for image and text to video, video-to-audio and more tasks, comes in 1.3B and 14B (OS)
> ZuluVision released MoviiGen1.1, new cinematic video generation model based on Wan 2.1 14B (OS)
> multimodalart released isometric-skeumorphic-3d-bnb, an isometric 3D asset generator (like AirBnB assets) based on Flux
> LTX-Video-0.9.7-distilled is a new real-time video generation (text and image to video) model by Lightricks
> Hidream_t2i_human_preference is a new text-to-image preference dataset by Rapidata with 195k human responses from 38k annotators

Audio 🗣️
> stabilityai released stable-audio-open-small new text-to-audio model
> TEN-framework released ten-vad, voice activity detection model (OS)

merve

posted an update 8 days ago

Post

2239

New sota open-source depth estimation: Marigold v1-1 🌼

> normal maps, depth maps of scenes & faces prs-eth/marigold-normals prs-eth/marigold
> get albedo (true color) and BRDF (texture) maps of scenes prs-eth/marigold-intrinsics
> they even release a depth-to-3D printer format demo 😮 prs-eth/depth-to-3d-print

All models are here prs-eth/marigold-computer-vision-6669e9e3d3ee30f48214b9ba

albertvillanova

posted an update 8 days ago

Post

2282

New in smolagents v1.16.0:
🔍 Bing support in WebSearchTool
🐍 Custom functions & executor_kwargs in LocalPythonExecutor
🔧 Streaming GradioUI fixes
🌐 Local web agents via api_base & api_key
📚 Better docs

👉 https://github.com/huggingface/smolagents/releases/tag/v1.16.0

Aurelien-Morgan

posted an update 9 days ago

Post

371

Hey, I'll be presenting @retrain-pipelines and almighty function-calling at the Hugging Face Paris HQ, you guys.
Monday evening. Lightning-talk style. With AI Tinkerers.

Come hang !

https://paris.aitinkerers.org/p/ai-tinkerers-paris-ai21-labs-takeover-on-may-19th

https://huggingface.co/blog/Aurelien-Morgan/the-almighty-function-caller

clem

posted an update 10 days ago

Post

3089

Very cool to see

pytorch contributing on Hugging Face. Time to follow them to see what they're cooking!

2 replies

·

merve

posted an update 12 days ago

Post

4997

VLMS 2025 UPDATE 🔥

We just shipped a blog on everything latest on vision language models, including
🤖 GUI agents, agentic VLMs, omni models
📑 multimodal RAG
⏯️ video LMs
🤏🏻 smol models
..and more! https://huggingface.co/blog/vlms-2025

1 reply

·

clem

posted an update 16 days ago

Post

3938

nvidia dominating the top trending open datasets these days!

http://hf.co/datasets

clem

posted an update 18 days ago

Post

4039

What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?

6 replies

·

merve

posted an update 18 days ago

Post

5049

A ton of impactful models and datasets in open AI past week, let's summarize the best 🤩 merve/releases-apr-21-and-may-2-6819dcc84da4190620f448a3

💬 Qwen made it rain! They released Qwen3: new dense and MoE models ranging from 0.6B to 235B 🤯 as well as Qwen2.5-Omni, any-to-any model in 3B and 7B!
> Microsoft AI released Phi4 reasoning models (that also come in mini and plus sizes)
> NVIDIA released new CoT reasoning datasets
🖼️ > ByteDance released UI-TARS-1.5, native multimodal UI parsing agentic model
> Meta released EdgeTAM, an on-device object tracking model (SAM2 variant)
🗣️ NVIDIA released parakeet-tdt-0.6b-v2, a smol 600M automatic speech recognition model
> Nari released Dia, a 1.6B text-to-speech model
> Moonshot AI released Kimi Audio, a new audio understanding, generation, conversation model
👩🏻‍💻 JetBrains released Melium models in base and SFT for coding
> Tesslate released UIGEN-T2-7B, a new text-to-frontend-code model 🤩

merve

posted an update 19 days ago

Post

6545

A real-time object detector much faster and accurate than YOLO with Apache 2.0 license just landed to Hugging Face transformers 🔥

D-FINE is the sota real-time object detector that runs on T4 (free Colab) 🤩

> Collection with all checkpoints and demo ustc-community/d-fine-68109b427cbe6ee36b4e7352

Notebooks:
> Tracking https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_tracking.ipynb
> Inference https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_inference.ipynb
> Fine-tuning https://github.com/qubvel/transformers-notebooks/blob/main/notebooks/DFine_finetune_on_a_custom_dataset.ipynb
h/t @vladislavbro @qubvel-hf @ariG23498 and the authors of the paper 🎩

Regular object detectors attempt to predict bounding boxes in (x, y, w, h) pixel perfect coordinates, which is very rigid and hard to solve 🥲☹️

D-FINE formulates object detection as a distribution for bounding box coordinates, refines them iteratively, and it's more accurate 🤩

Another core idea behind this model is Global Optimal Localization Self-Distillation ⤵️

this model uses final layer's distribution output (sort of like a teacher) to distill to earlier layers to make early layers more performant.

2 replies

·

merve

posted an update 22 days ago

Post

2203

you can easily fine-tune, quantize, play with sota vision LM InternVL3 now 🔥
we have recently merged InternVL3 to Hugging Face transformers and released converted checkpoints 🤗

collection for converted checkpoints: merve/internvl3-hf-6814be2943b2ae0e711c92a5
notebook: https://colab.research.google.com/drive/1wAQ7cyjyaCwLXbMA_OjXZe7aCxCFm6sI?usp=sharing 📖

clem

posted an update 22 days ago

Post

1646

LeRobot-worldwide-hackathon is already scheduled in 30 cities all over the world!

Check if there's one in your city here: LeRobot-worldwide-hackathon/worldwide-map

Paris AI Running Club

AI & ML interests

Recent Activity

paris-ai-running-club's activity

AI & ML interests

Recent Activity

Team members 65

paris-ai-running-club's activity