Mayank Mishra's picture

Mayank Mishra

mayank-mishra

·

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Recent Activity

updated a model 17 days ago

mayank-mishra/hybrid-mamba2-7b

published a model 17 days ago

mayank-mishra/hybrid-mamba2-7b

upvoted a collection 19 days ago

View all activity

Organizations

Posts 4

Post

2122

New preprint out with colleagues from MIT and IBM Research

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (2405.12981)

We introduce a simple mechanism of sharing keys and values across layers, reducing the memory needed for KV cache during inference!!

Articles 3

Article

41

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

View all Articles

Collections 1

Papers 24

arxiv:2505.22758

arxiv:2505.16381

arxiv:2502.09927

arxiv:2501.06589

models 9

mayank-mishra/hybrid-mamba2-7b

7B • Updated 17 days ago • 16

mayank-mishra/granite-3b-code-glaive-20k

Text Generation • 3B • Updated Jun 5, 2024 • 6

mayank-mishra/granite-20b-code-instruct-Q4_K_M-GGUF

Text Generation • 20B • Updated May 19, 2024 • 23

mayank-mishra/starcoder-GPTQ-8bit-128g

Updated May 5, 2023 • 11

mayank-mishra/starcoder-GPTQ-4bit-128g

Updated May 5, 2023 • 16

mayank-mishra/starcoderbase-GPTQ-4bit-128g

Updated May 5, 2023 • 21

mayank-mishra/starcoderbase-GPTQ-8bit-128g

Updated May 5, 2023 • 3

mayank-mishra/santacoder-GPTQ-4bit-128g

Updated May 4, 2023 • 2

mayank-mishra/santacoder-GPTQ-8bit-128g

Updated May 4, 2023 • 1

datasets 1

mayank-mishra/glaive-code-assisstant-v3-20k

Viewer • Updated Jun 5, 2024 • 20k • 28