Used Open R1 (by Huggingface) to SFT my earlier thinker models. Encouraging results. Checkpoints also present.

https://github.com/ewre324/open-r1/tree/main

Based on DeepSeek R1 based method to train on specific reasoning dataset to ensure more thinking. Still the ... tags are not generated. TODO.

Downloads last month
18
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for ewre324/ewre324-R1-SmolLM2-135M-Distill

Dataset used to train ewre324/ewre324-R1-SmolLM2-135M-Distill

Collection including ewre324/ewre324-R1-SmolLM2-135M-Distill