Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
koalazf99 's Collections
🧙 Guru
🐙 OctoThinker
💎 MegaMath
🫐 ProX Projects

🐙 OctoThinker

updated about 7 hours ago

Mid-training Incentivizes Reinforcement Learning Scaling

Upvote
1

  • OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

    Paper • 2506.20512 • Published 1 day ago • 21

  • OctoThinker/MegaMath-Web-Pro-Max

    Updated about 11 hours ago • 5

  • OctoThinker/OctoThinker-8B-Long-Base

    Updated Apr 24 • 11

  • OctoThinker/OctoThinker-8B-Hybrid-Base

    Updated Apr 24 • 12 • 2

  • OctoThinker/OctoThinker-8B-Short-Base

    Updated Apr 24 • 46

  • OctoThinker/OctoThinker-3B-Short-Zero

    Updated Apr 23 • 8

  • OctoThinker/OctoThinker-3B-Hybrid-Zero

    Updated Apr 23 • 32

  • OctoThinker/OctoThinker-1B-Long-Zero

    Updated Apr 23 • 7

  • OctoThinker/OctoThinker-1B-Hybrid-Zero

    Updated Apr 23 • 7

  • OctoThinker/OctoThinker-1B-Short-Zero

    Updated Apr 23 • 41

  • OctoThinker/Llama3.2-3B-Zero

    Updated Apr 22 • 11

  • OctoThinker/OctoThinker-3B-Long-Zero

    Updated Apr 22 • 8

  • OctoThinker/OctoThinker-1B-Long-Base

    Updated Apr 22 • 19

  • OctoThinker/OctoThinker-1B-Short-Base

    Updated Apr 22 • 23

  • OctoThinker/OctoThinker-1B-Hybrid-Base

    Updated Apr 22 • 12

  • OctoThinker/OctoThinker-3B-Long-Base

    Updated Apr 22 • 22

  • OctoThinker/OctoThinker-3B-Hybrid-Base

    Updated Apr 22 • 7

  • OctoThinker/OctoThinker-3B-Short-Base

    Updated Apr 22 • 9
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs