SmolLm2-135 R1 Distill

#5
by ewre324 - opened

Hello, I just used SFT to produce an R1 distill.
https://huggingface.co/ewre324/ewre324-R1-SmolLM2-135M-Distill

Please use and comment if possible.

i think the downside of thinking models is that even for simple question they may take alot of thinking tokens but i think we should have dataset to Train llms to figure out when to use thinking strategy and when to simply answer the question like regular llms do

Sign up or log in to comment