ewre324's picture
Update README.md
3592b13 verified
metadata
datasets:
  - HuggingFaceH4/Bespoke-Stratos-17k
base_model:
  - ewre324/ewre324-Thinker-SmolLM2-135M-Instruct-Reasoning

Used Open R1 (by Huggingface) to SFT my earlier thinker models. Encouraging results. Checkpoints also present.

https://github.com/ewre324/open-r1/tree/main

Based on DeepSeek R1 based method to train on specific reasoning dataset to ensure more thinking. Still the ... tags are not generated. TODO.