Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
OctoThinker 's Collections
Mid-training Analysis Checkpoints (Llama-3.2-3B)
OctoThinker-Llama-8B Family
OctoThinker-Llama-3B Family
OctoThinker-Llama-1B Family

OctoThinker-Llama-3B Family

updated Jul 6

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

Upvote
2

  • OctoThinker/OctoThinker-3B-Long-Base

    Text Generation • 3B • Updated Jul 6 • 9.1k

  • OctoThinker/OctoThinker-3B-Hybrid-Base

    Text Generation • 3B • Updated Jul 12 • 78

  • OctoThinker/OctoThinker-3B-Short-Base

    Text Generation • 3B • Updated Jul 12 • 2.75k

  • OctoThinker/OctoThinker-3B-Long-Zero

    Text Generation • 4B • Updated Jul 6 • 73

  • OctoThinker/OctoThinker-3B-Hybrid-Zero

    Text Generation • 4B • Updated Jul 12 • 59 • 1

  • OctoThinker/OctoThinker-3B-Short-Zero

    Text Generation • 4B • Updated Jul 12 • 10
Upvote
2
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs