R1 Distill
Collection
Collection of Distills using Open R1
•
2 items
•
Updated
Used Open R1 (by Huggingface) to SFT my earlier thinker models. Encouraging results. Checkpoints also present.
https://github.com/ewre324/open-r1/tree/main
Based on DeepSeek R1 based method to train on specific reasoning dataset to ensure more thinking. Still the ... tags are not generated. TODO.
Base model
HuggingFaceTB/SmolLM2-135M