fnlp
/

Text Generation
Safetensors
llama

Research Paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs"

Usage

Please refer to the GitHub repository MHA2MLA. The inference code is still being optimized. You can follow our subsequent work.

Citation

@misc{ji2025economicalinferenceenablingdeepseeks,
      title={Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs}, 
      author={Tao Ji and Bin Guo and Yuanbin Wu and Qipeng Guo and Lixing Shen and Zhan Chen and Xipeng Qiu and Qi Zhang and Tao Gui},
      year={2025},
      eprint={2502.14837},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14837}, 
}
Downloads last month
2
Safetensors
Model size
129M params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fnlp/SmolLM-135M-MLA-d_kv_8-refactor

Finetuned
(50)
this model

Dataset used to train fnlp/SmolLM-135M-MLA-d_kv_8-refactor

Collection including fnlp/SmolLM-135M-MLA-d_kv_8-refactor