README.md · ewre324/ewre324-R1-SmolLM2-135M-Distill at main

metadata

datasets:
  - HuggingFaceH4/Bespoke-Stratos-17k
base_model:
  - ewre324/ewre324-Thinker-SmolLM2-135M-Instruct-Reasoning

Used Open R1 (by Huggingface) to SFT my earlier thinker models. Encouraging results. Checkpoints also present.

Based on DeepSeek R1 based method to train on specific reasoning dataset to ensure more thinking. Still the ... tags are not generated. TODO.