File size: 1,155 Bytes
8bcba7b
 
6e06b7a
 
 
8bcba7b
 
 
 
fb095c2
8bcba7b
 
6e06b7a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
title: LLM KV Cache Calculator
emoji: 💻
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
short_description: Calculate KV cache memory requirements for LLMs
---

# KV Cache Calculator

Calculate KV cache memory requirements for transformer models.

## Credits

This implementation is derived from and builds upon the excellent work by [gaunernst](https://huggingface.co/spaces/gaunernst/kv-cache-calculator). Special thanks for the original implementation! 

## Features

- **Multi-attention support**: MHA (Multi-Head Attention), GQA (Grouped Query Attention), and MLA (Multi-head Latent Attention)
- **Multiple data types**: fp16/bf16, fp8, and fp4 quantization
- **Real-time calculation**: Instant memory requirement estimates
- **Model analysis**: Detailed breakdown of model configuration
- **Universal compatibility**: Works with any HuggingFace transformer model

## Usage

1. Enter your model ID (e.g., "Qwen/Qwen3-30B-A3B")
2. Set context length and number of users
3. Choose data type precision
4. Add HuggingFace token if needed for gated models
5. Click calculate to get memory requirements