R1-GRPO-Math-Python-Code-Experiments

0-hero 's Collections

updated May 11, 2025

Lora & full finetune experiments on r1 distills to generate python code for math problems