🐙 OctoThinker - a koalazf99 Collection

koalazf99 's Collections

🐙 OctoThinker

🫐 ProX Projects

🐙 OctoThinker

updated Jun 26

Mid-training Incentivizes Reinforcement Learning Scaling

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Paper • 2506.20512 • Published Jun 25 • 46
OctoThinker/MegaMath-Web-Pro-Max

Viewer • Updated Jul 6 • 69.2M • 6.2k • 35
OctoThinker/OctoThinker-8B-Long-Base

Text Generation • 8B • Updated Jul 6 • 13
OctoThinker/OctoThinker-8B-Hybrid-Base

Text Generation • 8B • Updated Jul 6 • 84 • 2
OctoThinker/OctoThinker-8B-Short-Base

Text Generation • 8B • Updated Jul 6 • 2.17k
OctoThinker/OctoThinker-3B-Short-Zero

Text Generation • 4B • Updated Jul 12 • 10
OctoThinker/OctoThinker-3B-Hybrid-Zero

Text Generation • 4B • Updated Jul 12 • 59 • 1
OctoThinker/OctoThinker-1B-Long-Zero

Text Generation • 1B • Updated Jul 6 • 13
OctoThinker/OctoThinker-1B-Hybrid-Zero

Text Generation • 1B • Updated Jul 6 • 10
OctoThinker/OctoThinker-1B-Short-Zero

Text Generation • 1B • Updated Jul 6 • 10
OctoThinker/Llama3.2-3B-Zero

4B • Updated Apr 22 • 5
OctoThinker/OctoThinker-3B-Long-Zero

Text Generation • 4B • Updated Jul 6 • 73
OctoThinker/OctoThinker-1B-Long-Base

Text Generation • 1B • Updated Jul 6 • 10
OctoThinker/OctoThinker-1B-Short-Base

Text Generation • 1B • Updated Jul 6 • 20
OctoThinker/OctoThinker-1B-Hybrid-Base

Text Generation • 1B • Updated Jul 6 • 433
OctoThinker/OctoThinker-3B-Long-Base

Text Generation • 3B • Updated Jul 6 • 9.1k
OctoThinker/OctoThinker-3B-Hybrid-Base

Text Generation • 3B • Updated Jul 12 • 78
OctoThinker/OctoThinker-3B-Short-Base

Text Generation • 3B • Updated Jul 12 • 2.75k