Multi-head latent attention (MLA) instead of Grouped query attention (GQA)

#18
by Rodeszones - opened

May I ask why Multi-head latent attention (MLA) is not used instead of Grouped query attention (GQA) in the new models? Is there a problem in MLA?

Hugging Face Smol Models Research org

We didn't try MLA because we didn't have time to derisk it, we will for future model

eliebak changed discussion status to closed

Sign up or log in to comment