1788 819 1812

Julien Chaumond PRO

julien-c

https://huggingface.co

AI & ML interests

<3 ML/AI for everyone, building products to propel communities fwd

Recent Activity

upvoted an article 10 days ago

Welcome GPT OSS, the new open-source model family from OpenAI!

new activity 10 days ago

lmarena-ai/lmarena-leaderboard:Files haven't been updated since Aug 4

upvoted a paper 11 days ago

Matrix-Game: Interactive World Foundation Model

View all activity

Organizations

reacted to etemiz's post with 👀 about 1 month ago

Post

5061

All you need is curation

1 reply

reacted to erikkaum's post with 🤗 about 1 month ago

Post

2032

We just released native support for @SGLang and @vllm-project in Inference Endpoints 🔥

Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.

And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users 🙌

replied to jsulz's post about 2 months ago

let's gooo!!

reacted to jsulz's post with 🚀 about 2 months ago

Post

4822

It's been a bit since I took a step back and looked at

xet-team progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind.

A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
🤗 700,000 users/orgs
📈 350,000 repos
🚀 15PB

Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).

These are hard numbers to put into context, but let's try:

The latest run of the Common Crawl from

commoncrawl was 471 TB.

We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.

We're moving to a new phase in the process, so stay tuned.

This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.

I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski 👀)

Let me know if there's anything you're interested in; happy to dig in!

5 replies

reacted to Narsil's post with 🚀🔥 2 months ago

Post

1986

Me: This function is too slow. Find a faster algorithm.
Cursor: Hold my beer.

Me: *Slacking off with colleagues*
Cursor: Ping.

Me: 🤯

reacted to jsulz's post with 🔥 3 months ago

Post

2512

Heyo @RichardErkhov the

xet-team at Hugging face was wondering if you wanted to join the fun and jump over to Xet storage. 🤗

We've been onboarding folks https://huggingface.co/blog/xet-on-the-hub know the backend can scale (Llama 4 and Qwen 3 are on Xet), is great for working with quants (see xet-team/quantization-dedup ), and we're pushing on inviting impactful orgs and users on the Hub. You fit the bill.

We'd love to onboard you, get some feedback, and create some excitement 🎉

The steps are pretty straightforward - join the waitlist at hf.co/join/xet and we'll take care of the rest.

The system is fully backward compatible, so you shouldn't notice a thing. BUT to get the best experience when uploading/downloading, make sure you have hf_xet installed alongside the latest huggingface_hub

What do you think?

4 replies

reacted to reach-vb's post with 👍 3 months ago

Post

4277

hey hey @mradermacher - VB from Hugging Face here, we'd love to onboard you over to our optimised xet backend! 💥

as you know we're in the process of upgrading our storage backend to xet (which helps us scale and offer blazingly fast upload/ download speeds too): https://huggingface.co/blog/xet-on-the-hub and now that we are certain that the backend can scale with even big models like Llama 4/ Qwen 3 - we;re moving to the next phase of inviting impactful orgs and users on the hub over as you are a big part of the open source ML community - we would love to onboard you next and create some excitement about it in the community too!

in terms of actual steps - it should be as simple as one of the org admins to join hf.co/join/xet - we'll take care of the rest.

p.s. you'd need to have a the latest hf_xet version of huggingface_hub lib but everything else should be the same: https://huggingface.co/docs/hub/storage-backends#using-xet-storage

p.p.s. this is fully backwards compatible so everything will work as it should! 🤗

16 replies

replied to cbensimon's post 3 months ago

WOOHOO!!

reacted to cbensimon's post with 🔥 3 months ago

Post

5949

🚀 ZeroGPU medium size is now available as a power-user feature

Nothing too fancy for now—ZeroGPU Spaces still default to large (70GB VRAM)—but this paves the way for:
- 💰 size-based quotas / pricing (medium will offer significantly more usage than large)
- 🦣 the upcoming xlarge size (141GB VRAM)

You can as of now control GPU size via a Space variable. Accepted values:
- auto (future default)
- medium
- large (current default)

The auto mode checks total CUDA tensor size during startup:
- More than 30GB → large
- Otherwise → medium

3 replies

replied to their post 4 months ago

did you get it to work since?

reacted to their post with 👍🚀🔥 4 months ago

Post

4109

Important notice 🚨

For Inference Providers who have built support for our Billing API (currently: Fal, Novita, HF-Inference – with more coming soon), we've started enabling Pay as you go (=PAYG)

What this means is that you can use those Inference Providers beyond the free included credits, and they're charged to your HF account.

You can see it on this view: any provider that does not have a "Billing disabled" badge, is PAYG-compatible.

9 replies

reacted to danielhanchen's post with ❤️🤗🔥 4 months ago

Post

6078

🦥 Introducing Unsloth Dynamic v2.0 GGUFs!
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.

Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF

We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.

Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0

All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300K–1.5M token calibration dataset to improve conversational chat performance.

For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.

Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.

reacted to their post with 😎🤗🔥 4 months ago

Post

6526

BOOOOM: Today I'm dropping TINY AGENTS

the 50 lines of code Agent in Javascript 🔥

I spent the last few weeks working on this, so I hope you will like it.

I've been diving into MCP (Model Context Protocol) to understand what the hype was all about.

It is fairly simple, but still quite powerful: MCP is a standard API to expose sets of Tools that can be hooked to LLMs.

But while doing that, came my second realization:

Once you have a MCP Client, an Agent is literally just a while loop on top of it. 🤯

➡️ read it exclusively on the official HF blog: https://huggingface.co/blog/tiny-agents

1 reply

Julien Chaumond PRO

AI & ML interests

Recent Activity

Organizations

julien-c's activity