--- license: apache-2.0 tags: - reasoning - mathematics - reinforcement-learning datasets: - AIME - AMC - Omni-Math base_model: R1-Distill-Qwen-1.5B --- # ALP_R1_Qwen1.5B R1-Distill-Qwen-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance. ## Training - 100 steps GRPO, batch 512, LR 1e-6, β=1e-7 - 16 rollouts/prompt for difficulty estimation - 8K context window ## Performance (Pass@1) - MATH-500: 0.81 - AIME: 0.252 - OlympiadBench: 0.51 ## Token Usage - MATH: 2804→862 (-69%) - AIME: 4007→3331 (-17%) - Olympiad: 3606→2107 (-42%) ## Usage ```python prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."