R1-GRPO-Math-Python-Code-Experiments Collection Lora & full finetune experiments on r1 distills to generate python code for math problems • 20 items • Updated 2 days ago