MHA2MLA-refactor
Collection
The MHA2MLA model published in the paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-Based LLMs"
•
12 items
•
Updated
Research Paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs"
Please refer to the GitHub repository MHA2MLA. The inference code is still being optimized. You can follow our subsequent work.
@misc{ji2025economicalinferenceenablingdeepseeks,
title={Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs},
author={Tao Ji and Bin Guo and Yuanbin Wu and Qipeng Guo and Lixing Shen and Zhan Chen and Xipeng Qiu and Qi Zhang and Tao Gui},
year={2025},
eprint={2502.14837},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.14837},
}
Base model
HuggingFaceTB/SmolLM-135M