Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

OctoThinker

community

https://github.com/GAIR-NLP/OctoThinker

GAIR-NLP

AI & ML interests

None defined yet.

OctoThinker 's collections 4

Mid-training Analysis Checkpoints (Llama-3.2-3B)

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

OctoThinker/Llama_32_3B_finemath_4p_bs4M_seq8k_20B

Text Generation • Updated Jul 7, 2025
OctoThinker/Llama_32_3B_megamath_web_pro_bs4M_seq8k_20B

Text Generation • Updated Jul 7, 2025
OctoThinker/Llama_32_3B_megamath_web_pro_max_bs4M_seq8k_20B

Text Generation • Updated Jul 7, 2025
OctoThinker/Llama_32_3B_megamath_web_pro_megamath_synth_qa_31_bs4M_seq8k_20B

Updated Jul 3, 2025

OctoThinker-Llama-3B Family

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

OctoThinker/OctoThinker-3B-Long-Base

Text Generation • 3B • Updated Jul 6, 2025 • 8
OctoThinker/OctoThinker-3B-Hybrid-Base

Text Generation • 3B • Updated Jul 12, 2025 • 218 • 1
OctoThinker/OctoThinker-3B-Short-Base

Text Generation • 3B • Updated Jul 12, 2025 • 5
OctoThinker/OctoThinker-3B-Long-Zero

Text Generation • 4B • Updated Jul 6, 2025 • 57

OctoThinker-Llama-8B Family

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

OctoThinker/OctoThinker-8B-Long-Base

Text Generation • 8B • Updated Jul 6, 2025 • 9
OctoThinker/OctoThinker-8B-Hybrid-Base

Text Generation • 8B • Updated Jul 6, 2025 • 391 • 2
OctoThinker/OctoThinker-8B-Short-Base

Text Generation • 8B • Updated Jul 6, 2025 • 8 • 1

OctoThinker-Llama-1B Family

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

OctoThinker/OctoThinker-1B-Long-Base

Text Generation • 1B • Updated Jul 6, 2025 • 8
OctoThinker/OctoThinker-1B-Hybrid-Base

Text Generation • 1B • Updated Jul 6, 2025 • 4
OctoThinker/OctoThinker-1B-Short-Base

Text Generation • 1B • Updated Jul 6, 2025 • 9
OctoThinker/OctoThinker-1B-Long-Zero

Text Generation • 1B • Updated Jul 6, 2025 • 7

Mid-training Analysis Checkpoints (Llama-3.2-3B)

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

OctoThinker/Llama_32_3B_finemath_4p_bs4M_seq8k_20B

Text Generation • Updated Jul 7, 2025
OctoThinker/Llama_32_3B_megamath_web_pro_bs4M_seq8k_20B

Text Generation • Updated Jul 7, 2025
OctoThinker/Llama_32_3B_megamath_web_pro_max_bs4M_seq8k_20B

Text Generation • Updated Jul 7, 2025
OctoThinker/Llama_32_3B_megamath_web_pro_megamath_synth_qa_31_bs4M_seq8k_20B

Updated Jul 3, 2025

OctoThinker-Llama-8B Family

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

OctoThinker/OctoThinker-8B-Long-Base

Text Generation • 8B • Updated Jul 6, 2025 • 9
OctoThinker/OctoThinker-8B-Hybrid-Base

Text Generation • 8B • Updated Jul 6, 2025 • 391 • 2
OctoThinker/OctoThinker-8B-Short-Base

Text Generation • 8B • Updated Jul 6, 2025 • 8 • 1

OctoThinker-Llama-3B Family

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

OctoThinker/OctoThinker-3B-Long-Base

Text Generation • 3B • Updated Jul 6, 2025 • 8
OctoThinker/OctoThinker-3B-Hybrid-Base

Text Generation • 3B • Updated Jul 12, 2025 • 218 • 1
OctoThinker/OctoThinker-3B-Short-Base

Text Generation • 3B • Updated Jul 12, 2025 • 5
OctoThinker/OctoThinker-3B-Long-Zero

Text Generation • 4B • Updated Jul 6, 2025 • 57

OctoThinker-Llama-1B Family

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

OctoThinker/OctoThinker-1B-Long-Base

Text Generation • 1B • Updated Jul 6, 2025 • 8
OctoThinker/OctoThinker-1B-Hybrid-Base

Text Generation • 1B • Updated Jul 6, 2025 • 4
OctoThinker/OctoThinker-1B-Short-Base

Text Generation • 1B • Updated Jul 6, 2025 • 9
OctoThinker/OctoThinker-1B-Long-Zero

Text Generation • 1B • Updated Jul 6, 2025 • 7

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs