Multi-head latent attention (MLA) instead of Grouped query attention (GQA)
#18
by
Rodeszones
- opened
May I ask why Multi-head latent attention (MLA) is not used instead of Grouped query attention (GQA) in the new models? Is there a problem in MLA?
We didn't try MLA because we didn't have time to derisk it, we will for future model
eliebak
changed discussion status to
closed