Models in the paper "Learning to Reason without External Rewards"
-
sunblaze-ucb/Qwen2.5-3B-Intuitor-MATH-1EPOCH
Text Generation • 3B • Updated • 1.85k -
sunblaze-ucb/Qwen2.5-1.5B-Intuitor-MATH-1EPOCH
Text Generation • 2B • Updated • 98 -
sunblaze-ucb/Qwen3-14B-Intuitor-MATH-1EPOCH
Text Generation • 15B • Updated • 116 -
sunblaze-ucb/OLMo-2-7B-SFT-Intuitor-MATH-1EPOCH
Text Generation • 7B • Updated • 153