Stephen Oates PRO

soates

AI & ML interests

None yet

Recent Activity

upvoted an article 24 days ago

Deriving the PPO Loss from First Principles

upvoted an article about 1 month ago

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

upvoted a collection about 1 month ago

Physics of Language Models: Part 4.2

View all activity

Organizations

None yet

upvoted an article 24 days ago

Article

Deriving the PPO Loss from First Principles

26 days ago

•

upvoted an article about 1 month ago

Article

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

Dec 8, 2025

•

upvoted a collection about 1 month ago

Physics of Language Models: Part 4.2

Collection

16 items • Updated Jul 29, 2025 • 15

upvoted an article about 2 months ago

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

578

upvoted a paper 3 months ago

The Massive Legal Embedding Benchmark (MLEB)

Paper • 2510.19365 • Published Oct 22, 2025 • 17

upvoted an article 3 months ago

Article

Australian-made LLM beats OpenAI and Google at legal retrieval

Oct 23, 2025

•

upvoted an article 4 months ago

Article

There is no such thing as a tokenizer-free lunch

Sep 25, 2025

•

upvoted 2 papers 4 months ago

Virtual Agent Economies

Paper • 2509.10147 • Published Sep 12, 2025 • 26

The Majority is not always right: RL training for solution aggregation

Paper • 2509.06870 • Published Sep 8, 2025 • 16

upvoted 2 papers 8 months ago

Large Language Models are Locally Linear Mappings

Paper • 2505.24293 • Published May 30, 2025 • 14

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Paper • 2505.11711 • Published May 16, 2025 • 11

upvoted an article 8 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21, 2025

•

248

upvoted an article 9 months ago

Article

Tiny Agents: an MCP-powered agent in 50 lines of code

Apr 25, 2025

•

305

upvoted a paper 9 months ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 139

upvoted an article 9 months ago

Article

Gotchas in Tokenizer Behavior Every Developer Should Know

Apr 18, 2025

•

upvoted a collection 10 months ago

Gemma 3

Collection

All versions of Google's new multimodal models including QAT in 1B, 4B, 12B, and 27B sizes. In GGUF, dynamic 4-bit and 16-bit formats. • 55 items • Updated 27 days ago • 103

upvoted 2 articles 12 months ago

Article

Open-R1: Update #1

Feb 2, 2025

•

305

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28, 2025

•

886

upvoted a collection 12 months ago

EvaByte

Collection

3 items • Updated Jan 21, 2025 • 4

upvoted an article about 1 year ago

Article

Mastering Tensor Dimensions in Transformers

Jan 12, 2025

•

130

Stephen Oates PRO

AI & ML interests

Recent Activity

Organizations

soates's activity

Deriving the PPO Loss from First Principles

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

We Got Claude to Fine-Tune an Open Source LLM

Australian-made LLM beats OpenAI and Google at legal retrieval

There is no such thing as a tokenizer-free lunch

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Tiny Agents: an MCP-powered agent in 50 lines of code

Gotchas in Tokenizer Behavior Every Developer Should Know

Open-R1: Update #1

Open-R1: a fully open reproduction of DeepSeek-R1

Mastering Tensor Dimensions in Transformers