So this is just a SFT "distill" of Magistral-Medium ?
#6
by
gghfez
- opened
Hi, I'm just making sure I understand.
You basically did this: cognitivecomputations/Dolphin3.0-R1-Mistral-24B
But using Magistral-Medium to generate the traces, as opposed to DeepSeek-R1 like cognitivecomputations did?
Hi there, as mentionned in the paper, it was:
- Mistral Medium + RL = Magistral Medium
- Mistral Small + SFT (from Magistral Medium) + RL = Magistral Small
Both had RL
Thanks, I was feeling groggy / missed it when I read the paper the first time.