Draft Model for Speculative Decoding

#4
by SupremeCmdr741 - opened

As the title said, I wondered if there is a draft model for speculative decoding, something a fraction the size of the big one, to speed things up.

Ready.Art org

Sorry it took so long to get to this.

I don't remember if sleep did this model from scratch or if he based it on Cydonia 2.1.

if it is on Cydonia, you can try this draft model: https://huggingface.co/alamios/Mistral-Small-3.1-DRAFT-0.5B

There is a good chance it will work.

thank you!

No problem!

After looking at the merge information in more detail, yeah, that model should work as a draft.

FrenzyBiscuit changed discussion status to closed

Sign up or log in to comment