119 13 18

Omkar Pangarkar

omkarenator

AI & ML interests

None yet

Recent Activity

upvoted an article 10 days ago

nanoJAXGPT: A pedagogical introduction to JAX/Equinox

upvoted a paper about 1 month ago

Essential-Web v1.0: 24T tokens of organized web data

liked a Space 2 months ago

nanotron/predict_memory

View all activity

Organizations

upvoted an article 10 days ago

Article

nanoJAXGPT: A pedagogical introduction to JAX/Equinox

and 2 others •

Oct 23, 2024

• 5

upvoted a paper about 1 month ago

Essential-Web v1.0: 24T tokens of organized web data

Paper • 2506.14111 • Published Jun 17 • 41

liked a Space 2 months ago

Predict Memory

🧮

Analyze and visualize memory usage from model configurations

upvoted a paper 3 months ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published Apr 17 • 92

liked a dataset 3 months ago

WebOrganizer/Corpus-200B

Preview • Updated Feb 19 • 9.08k • 8

liked a Space 3 months ago

116

TxT360: Trillion Extracted Text

📖

Create a large-scale deduplicated text dataset for LLM training

liked a model 5 months ago

mlfoundations/fasttext-oh-eli5

Updated Aug 1, 2024 • 25

liked a Space 5 months ago

2.84k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

New activity in LLM360/TxT360 5 months ago

fix-deps

#7 opened 5 months ago by

omkarenator

updated a Space 5 months ago

116

TxT360: Trillion Extracted Text

📖

Create a large-scale deduplicated text dataset for LLM training

New activity in LLM360/TxT360 5 months ago

code-formatting

#6 opened 5 months ago by

omkarenator

liked a Space 6 months ago

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

Evaluate multilingual models using FineTasks

upvoted an article 6 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

and 2 others •

Jan 28

• 877

New activity in LLM360/TxT360 9 months ago

Add citations and other fixes

#4 opened 9 months ago by

omkarenator

liked a dataset 9 months ago

LLM360/TxT360

Updated May 26 • 31.4k • 237

New activity in LLM360/TxT360 9 months ago

Update arxiv examples

#3 opened 9 months ago by

zhoujun

upvoted an article 10 months ago

Article

Scaling AI-based Data Processing with Hugging Face + Dask

and 3 others •

Oct 9, 2024

• 31

updated a Space 10 months ago

Fh New

📊

updated a Space 11 months ago

FineWeb: decanting the web for the finest text data at scale

🍷

Generate high-quality web text data for LLM training

Omkar Pangarkar

AI & ML interests

Recent Activity

Organizations

omkarenator's activity

nanoJAXGPT: A pedagogical introduction to JAX/Equinox

Predict Memory

TxT360: Trillion Extracted Text

The Ultra-Scale Playbook

fix-deps

TxT360: Trillion Extracted Text

code-formatting

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

Open-R1: a fully open reproduction of DeepSeek-R1

Add citations and other fixes

Update arxiv examples

Scaling AI-based Data Processing with Hugging Face + Dask

Fh New

Toc

FineWeb: decanting the web for the finest text data at scale