OctoThinker-Llama-3B Family

OctoThinker 's Collections

Mid-training Analysis Checkpoints (Llama-3.2-3B)

updated Jul 6

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

Upvote

OctoThinker/OctoThinker-3B-Long-Base

Text Generation • 3B • Updated Jul 6 • 9.1k
OctoThinker/OctoThinker-3B-Hybrid-Base

Text Generation • 3B • Updated Jul 12 • 78
OctoThinker/OctoThinker-3B-Short-Base

Text Generation • 3B • Updated Jul 12 • 2.75k
OctoThinker/OctoThinker-3B-Long-Zero

Text Generation • 4B • Updated Jul 6 • 73
OctoThinker/OctoThinker-3B-Hybrid-Zero

Text Generation • 4B • Updated Jul 12 • 59 • 1
OctoThinker/OctoThinker-3B-Short-Zero

Text Generation • 4B • Updated Jul 12 • 10

Upvote

Collection guide
Browse collections