3 16

Rajat Arya

rajatarya

https://rajatarya.com

AI & ML interests

None yet

Recent Activity

upvoted an article about 7 hours ago

SmolLM3: smol, multilingual, long-context reasoner

reacted to jsulz's post with 🔥 12 days ago

It's been a bit since I took a step back and looked at https://huggingface.co/xet-team progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind. A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today? 🤗 700,000 users/orgs 📈 350,000 repos 🚀 15PB Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time). These are hard numbers to put into context, but let's try: The latest run of the Common Crawl from https://huggingface.co/commoncrawl was 471 TB. We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours. We're moving to a new phase in the process, so stay tuned. This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community. I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski 👀) Let me know if there's anything you're interested in; happy to dig in!

reacted to jsulz's post with 🚀 12 days ago

View all activity

Organizations

upvoted an article about 7 hours ago

Article

SmolLM3: smol, multilingual, long-context reasoner

and 22 others •

1 day ago

• 188

reacted to jsulz's post with 🔥🚀 12 days ago

Post

4257

It's been a bit since I took a step back and looked at

xet-team progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind.

A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
🤗 700,000 users/orgs
📈 350,000 repos
🚀 15PB

Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).

These are hard numbers to put into context, but let's try:

The latest run of the Common Crawl from

commoncrawl was 471 TB.

We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.

We're moving to a new phase in the process, so stay tuned.

This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.

I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski 👀)

Let me know if there's anything you're interested in; happy to dig in!

4 replies

upvoted an article about 1 month ago

Article

Xet is on the Hub

and 5 others •

Mar 18

• 61

upvoted a changelog about 1 month ago

Changelog

Xet is now the default storage option for new users and organizations

May 23

• 66

New activity in xet-team/README about 2 months ago

Xet Storage Not Deduplicating for Even Simple Binary Files

#3 opened about 2 months ago by

lyk

upvoted a collection 2 months ago

Qwen3

Collection

72 items • Updated 23 days ago • 835

New activity in meta-llama/Llama-4-Maverick-17B-128E-Instruct 3 months ago

[request for feedback] faster downloads with xet

#18 opened 3 months ago by

clem

New activity in meta-llama/Llama-4-Scout-17B-16E-Instruct 3 months ago

[request for feedback] Faster downloads with Xet

🔥 12

#16 opened 3 months ago by

clem

upvoted a collection 3 months ago

Llama 4

Collection

Llama 4 release • 13 items • Updated Apr 29 • 565

upvoted an article 3 months ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

and 6 others •

Apr 5

• 145

published an article 3 months ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

and 6 others •

Apr 5

• 145

published a model 3 months ago

rajatarya/xet-prod

Updated Mar 28

replied to jsulz's post 4 months ago

Let's go! Get on the waitlist - can't wait to get you onboarded with Xet. From a customer onboarding last week, "Yeah, it's pretty seamless.... oh wait, that was fast."

reacted to jsulz's post with ❤️🚀 4 months ago

Post

1475

It's finally here ❤️

Build faster than ever with lightning fast upload and download speeds starting today on the Hub ⚡

Xet storage is rolling out access across the Hub - join the waitlist here https://huggingface.co/join/xet

You can apply for yourself, or your entire organization. Head over to your account settings for more information or join anywhere you see the Xet logo on a repository you know.

Have questions? Join the conversation below 👇 or open a discussion on the Xet team page xet-team/README

6 replies

upvoted an article 5 months ago

Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

and 3 others •

Feb 12

• 68

reacted to jsulz's post with 👀👍🔥 7 months ago

Post

1457

Doing a lot of benchmarking and visualization work, which means I'm always searching for interesting repos in terms of file types, size, branches, and overall structure.

To help, I built a Space jsulz/repo-info that lets you search for any repo and get back:

- Treemap of the repository, color coded by file/directory size
- Repo branches and their size
- Cumulative size of different file types (e.g., the total size of all the safetensors in the repo)

And because I'm interested in how this will fit in our work to leverage content-defined chunking for versioning repos on the Hub
- https://huggingface.co/blog/from-files-to-chunks - everything has the number of chunks (1 chunk = 64KB) as well as the total size in bytes.

Some of the treemaps are pretty cool. Attached are black-forest-labs/FLUX.1-dev and for fun laion/laion-audio-preview (which has nearly 10k .tar files 🤯)

2 replies

Rajat Arya

AI & ML interests

Recent Activity

Organizations

rajatarya's activity

SmolLM3: smol, multilingual, long-context reasoner

Xet is on the Hub

Xet is now the default storage option for new users and organizations

Xet Storage Not Deduplicating for Even Simple Binary Files

[request for feedback] faster downloads with xet

[request for feedback] Faster downloads with Xet

Welcome Llama 4 Maverick & Scout on Hugging Face!

Welcome Llama 4 Maverick & Scout on Hugging Face!

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub