Multi-head latent attention (MLA) instead of Grouped query attention (GQA)

#18

by Rodeszones - opened 12 days ago

12 days ago

May I ask why Multi-head latent attention (MLA) is not used instead of Grouped query attention (GQA) in the new models? Is there a problem in MLA?

Hugging Face Smol Models Research org 9 days ago

We didn't try MLA because we didn't have time to derisk it, we will for future model

eliebak changed discussion status to closed about 21 hours ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment