Used Open R1 (by Huggingface) to SFT my earlier thinker models. Encouraging results. Checkpoints also present.

Based on DeepSeek R1 based method to train on specific reasoning dataset to ensure more thinking. Still the ... tags are not generated. TODO.

Safetensors

Model size

135M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ewre324/ewre324-R1-SmolLM2-135M-Distill

Base model

Finetuned

Finetuned

(1)

this model

Dataset used to train ewre324/ewre324-R1-SmolLM2-135M-Distill