omkarenator (Omkar Pangarkar)

upvoted an article 3 months ago

Article

Mixture of Experts Explained

+4

Dec 11, 2023

•

1.07k

upvoted a collection 3 months ago

🤖 Agents

Collection

21 items • Updated Dec 31, 2024 • 173

upvoted an article 4 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

Jul 8, 2025

•

757

upvoted a paper 4 months ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 152

upvoted a collection 5 months ago

The Ultimate Collection of Code Classifiers

Collection

🔥 15 classifiers, 124M parameters, one per programming language— for assessing the educational value of GitHub code • 15 items • Updated May 5, 2025 • 15

upvoted a paper 7 months ago

Essential-Web v1.0: 24T tokens of organized web data

Paper • 2506.14111 • Published Jun 17, 2025 • 46

upvoted an article 8 months ago

Article

nanoJAXGPT: A pedagogical introduction to JAX/Equinox

Oct 23, 2024

•

7

upvoted a paper 10 months ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published Apr 17, 2025 • 93

upvoted an article about 1 year ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

+1

Jan 28, 2025

•

888

upvoted an article over 1 year ago

Article

Scaling AI-based Data Processing with Hugging Face + Dask

+2

Oct 9, 2024

•

32

upvoted a paper over 1 year ago

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published Aug 23, 2024 • 23

upvoted 6 papers about 2 years ago

Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures

Paper • 2402.05424 • Published Feb 8, 2024 • 16

Omkar Pangarkar

AI & ML interests

Organizations

Mixture of Experts Explained

🤖 Agents

SmolLM3: smol, multilingual, long-context reasoner

StarCoder 2 and The Stack v2: The Next Generation

The Ultimate Collection of Code Classifiers

Essential-Web v1.0: 24T tokens of organized web data

nanoJAXGPT: A pedagogical introduction to JAX/Equinox

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Open-R1: a fully open reproduction of DeepSeek-R1

Scaling AI-based Data Processing with Hugging Face + Dask

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Transformers are Multi-State RNNs

The Stack: 3 TB of permissively licensed source code

LLM Augmented LLMs: Expanding Capabilities through Composition

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Omkar Pangarkar

AI & ML interests

Organizations

omkarenator's activity

Mixture of Experts Explained

SmolLM3: smol, multilingual, long-context reasoner

nanoJAXGPT: A pedagogical introduction to JAX/Equinox

Open-R1: a fully open reproduction of DeepSeek-R1

Scaling AI-based Data Processing with Hugging Face + Dask