Hugging Face

Enterprise

company

Verified

https://huggingface.co

huggingface

Activity Feed

AI & ML interests

The AI community building the future.

Recent Activity

sayakpaul updated a dataset 17 minutes ago

huggingface/diffusers-metadata

clefourrier new activity about 1 hour ago

huggingface/documentation-images:Add FilBench images

mishig updated a Space about 12 hours ago

huggingface/inference-playground

View all activity

Articles

TimeScope: How Long Can Your Video Large Multimodal Model Go?

16 days ago

• 32

Yay! Organizations can now publish blog Articles

Jan 20

• 48

sayakpaul

updated a dataset 17 minutes ago

huggingface/diffusers-metadata

Viewer • Updated 17 minutes ago • 72 • 744 • 7

clefourrier

in huggingface/documentation-images about 1 hour ago

Add FilBench images

#531 opened about 5 hours ago by

ljvmiranda921

mishig

updated a Space about 12 hours ago

194

Inference Playground

🔋

Set theme for Hugging Face Playground

clem

posted an update about 13 hours ago

Post

892

Thread to gossip during the

openai GPT-5 livestream: https://www.youtube.com/watch?v=0Uu_VJeVVfo. Feel free to post your impressions below!

27 replies

BrigitteTousi

posted an update about 16 hours ago

Post

162

New interactive viz from AI World showing OpenAI's new open model gpt-oss-120b breaking into the top 50 most liked models of all time on the Hub in under a day! ☄️☄️☄️

sergiopaniego

posted an update about 16 hours ago

Post

147

Latest TRL release brings major upgrades for multimodal alignment!

We dive into 3 new techniques to improve VLM post-training in our new blog:

🌋 GRPO
🎞️ GSPO
🐙 MPO
➕ vLLM integration for online training w/ transformers backend\

🐡 Blog: https://huggingface.co/blog/trl-vlm-alignment

merve

posted an update about 17 hours ago

Post

556

GPT-4.1-mini level model right in your iPhone 🤯

openbmb/MiniCPM-V-4 is only 4B while surpassing GPT-4.1-mini in vision benchmarks 🔥

allows commercial use as well!

siro1

updated a dataset about 18 hours ago

huggingface/documentation-images

Viewer • Updated about 1 hour ago • 55 • 2.68M • 78

merve

updated a dataset about 19 hours ago

huggingface/documentation-images

Viewer • Updated about 1 hour ago • 55 • 2.68M • 78

dvilasuero

updated a dataset about 20 hours ago

huggingface/documentation-images

Viewer • Updated about 1 hour ago • 55 • 2.68M • 78

tomaarsen

posted an update 1 day ago

Post

2571

😎 I just published Sentence Transformers v5.1.0, and it's a big one. 2x-3x speedups of SparseEncoder models via ONNX and/or OpenVINO backends, easier distillation data preparation with hard negatives mining, and more:

1️⃣ Faster ONNX and OpenVINO backends for SparseEncoder models
Usage is as simple as backend="onnx" or backend="openvino" when initializing a SparseEncoder to get started, but I also included utility functions for optimization, dynamic quantization, and static quantization, plus benchmarks.

2️⃣ New n-tuple-scores output format from mine_hard_negatives
This new output format is immediately compatible with the MarginMSELoss and SparseMarginMSELoss for training SentenceTransformer, CrossEncoder, and SparseEncoder losses.

3️⃣ Gathering across devices
When doing multi-GPU training using a loss that has in-batch negatives (e.g. MultipleNegativesRankingLoss), you can now use gather_across_devices=True to load in-batch negatives from the other devices too! Essentially a free lunch, pretty big impact potential in my evals.

4️⃣ Trackio support
If you also upgrade transformers, and you install trackio with pip install trackio, then your experiments will also automatically be tracked locally with trackio. Just open up localhost and have a look at your losses/evals, no logins, no metric uploading.

5️⃣ MTEB Documentation
We've added some documentation on evaluating SentenceTransformer models properly with MTEB. It's rudimentary as the documentation on the MTEB side is already great, but it should get you started.

Plus many more smaller features & fixes (crash fixes, compatibility with datasets v4, FIPS compatibility, etc.).

See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v5.1.0

Big thanks to all of the contributors for helping with the release, many of the features from this release were proposed by others. I have a big list of future potential features that I'd love to add, but I'm

sergiopaniego

posted an update 3 days ago

Post

1995

OpenAI's open models are out! 💃

Try: https://www.gpt-oss.com/
Learn: https://huggingface.co/blog/welcome-openai-gpt-oss

1 reply

merve

posted an update 3 days ago

Post

865

we're all sleeping on this OCR model rednote-hilab/dots.ocr 🔥

dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! 🤯

single e2e model to extract image, convert tables, formula, and more into markdown 📝
try it MohamedRashad/Dots-OCR

sergiopaniego

posted an update 4 days ago

Post

3247

Want to learn how to align a Vision Language Model (VLM) for reasoning using GRPO and TRL? 🌋

🧑‍🍳 We've got you covered!!

NEW multimodal post training recipe to align a VLM using TRL in @HuggingFace 's Cookbook.

Go to the recipe 👉https://huggingface.co/learn/cookbook/fine_tuning_vlm_grpo_trl

Powered by the latest TRL v0.20 release, this recipe shows how to teach Qwen2.5-VL-3B-Instruct to reason over images 🌋

merve

posted an update 4 days ago

Post

496

massive releases and tons of Flux 1. Krea LoRas past week!
here's some of the picks, find more models in collection 🫡 merve/releases-august-2-6890c14248203522b7d0267f

LLMs 💬
> Tencent dropped tencent/Hunyuan-7B-Instruct
> Qwen released Qwen/Qwen3-Coder-30B-A3B-Instruct, 30B MoE with 3B params for coding (OS)

vision/multimodal
> RedNote released rednote-hilab/dots.ocr - 3B OCR model (OS)
> Cohere released CohereLabs/command-a-vision-07-2025 - 112B (dense!) VLM for 6 languages
> StepFun-AI shipped stepfun-ai/step3 - 321B MoE VLM (OS)
> Skywork shipped Skywork/Skywork-UniPic-1.5B - new any-to-any model (image+text → image+text) (OS)

a-r-r-o-w

posted an update 4 days ago

Post

1922

You would've implemented the 3-loop matrix multiplication many times as a ML practitioner, but the naive implementation is terrible for GPU performance. Modern GPUs achieve peak performance through careful memory access patterns and minimizing scheduling overhead.

In naive matmul (MxK . KxN), the computation happens in tiles - both for the output matrix and for how you read chunks from the input matrices. Each thread-block processes one output tile by loading corresponding tiles from input (for sum-reduction across K dimension), performing the computation, then terminating. The GPU launches many thread-blocks and schedules them across available streaming multiprocessors (SMs). When an SM finishes one tile, it gets assigned a new thread-block for the next uncomputed tile. This way, multiple output tiles are computed in parallel across the SMs, but we pay the cost for launching thread-blocks each time a new tile is computed.

Persistent matmul changes this approach. Instead of launching thread-blocks to compute some output tiles, computing the results on SMs in parallel, and repeating until all output tiles are computed, you launch only as many thread-blocks as you have SMs available (typically 80-132 on modern GPUs). These thread-blocks stay alive until all output tiles are computed, looping through multiple tiles sequentially. Each persistent thread-block may handle multiple output tiles.

The key benefit is the reduced thread-block launch latency. This persistence strategy, combined with other optimizations like coalesced memory loads/stores, block-tiling, warp-tiling, warp-specialization, double-buffering, ping-pong scheduling and other tricks, helps achieve peak performance. More on this in the future!

Code snippet for testing: https://gist.github.com/a-r-r-o-w/28339b442d164084506c0967029968a8

(Bonus: Since I've wanted to learn Manim for a while, this was a great opportunity to make a visualization for Naive VS Persistent matmul. Enjoy ✨)

3 replies

sergiopaniego

posted an update 4 days ago

Post

4414

Just included example scripts for aligning models using GSPO (including VLM example) 🙆‍♂️🙆‍♂️

GSPO is the latest RL alignment algo by @Alibaba_Qwen and it's already supported in the latest TRL v0.20 release.

Super-easy-to-get-started example scripts below, GO run them!👩‍💻👩‍💻

🧑‍🎨 Script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo.py
🦄 VLM script: https://github.com/huggingface/trl/blob/main/examples/scripts/gspo_vlm.py
🧩 More TRL examples: https://huggingface.co/docs/trl/main/en/example_overview
🧙‍♂️ GSPO paper: Group Sequence Policy Optimization (2507.18071)

merve

posted an update 8 days ago

Post

2099

Cohere just dropped CohereLabs/command-a-vision-07-2025, a 112B (dense!) vision LM
> based on SigLIP2 & Command-A
> built for enterprise use cases 🔥
> use with Inference Providers or transformers 🤗
read their blog https://huggingface.co/blog/CohereLabs/introducing-command-a-vision-07-2025

2 replies

angt

posted an update 8 days ago

Post

159

The new hf jobs CLI is absolutely awesome!
I couldn't resist writing a blog post about it:
https://huggingface.co/blog/angt/your-own-gpu-powered-image-generator-with-hf-jobs

sergiopaniego

posted an update 9 days ago

Post

300

Did you miss this? 👓

🧙‍♂️vLLM + transformers integration just got upgraded with direct VLM support.

Select a VLM + model_impl=transformers and play via vLLM!

AI & ML interests

Recent Activity

Articles

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Yay! Organizations can now publish blog Articles

Team members 211

huggingface's activity

Add FilBench images

Inference Playground