So this is just a SFT "distill" of Magistral-Medium ?

#6
by gghfez - opened

Hi, I'm just making sure I understand.

You basically did this: cognitivecomputations/Dolphin3.0-R1-Mistral-24B

But using Magistral-Medium to generate the traces, as opposed to DeepSeek-R1 like cognitivecomputations did?

https://mistral.ai/static/research/magistral.pdf

Read the paper. RL and then SFT on top.

Mistral AI_ org

Hi there, as mentionned in the paper, it was:

  • Mistral Medium + RL = Magistral Medium
  • Mistral Small + SFT (from Magistral Medium) + RL = Magistral Small
    Both had RL

Thanks, I was feeling groggy / missed it when I read the paper the first time.

Sign up or log in to comment