File size: 1,155 Bytes
8bcba7b 6e06b7a 8bcba7b fb095c2 8bcba7b 6e06b7a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
title: LLM KV Cache Calculator
emoji: 💻
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
short_description: Calculate KV cache memory requirements for LLMs
---
# KV Cache Calculator
Calculate KV cache memory requirements for transformer models.
## Credits
This implementation is derived from and builds upon the excellent work by [gaunernst](https://huggingface.co/spaces/gaunernst/kv-cache-calculator). Special thanks for the original implementation!
## Features
- **Multi-attention support**: MHA (Multi-Head Attention), GQA (Grouped Query Attention), and MLA (Multi-head Latent Attention)
- **Multiple data types**: fp16/bf16, fp8, and fp4 quantization
- **Real-time calculation**: Instant memory requirement estimates
- **Model analysis**: Detailed breakdown of model configuration
- **Universal compatibility**: Works with any HuggingFace transformer model
## Usage
1. Enter your model ID (e.g., "Qwen/Qwen3-30B-A3B")
2. Set context length and number of users
3. Choose data type precision
4. Add HuggingFace token if needed for gated models
5. Click calculate to get memory requirements
|