CURRENTLY UPLOADING Notice will be removed once complete
See Kimi-K2-Instruct-0905 Dynamic MLX in action - https://youtu.be/Ia-q3Ll4tAY
q3.824bit dynamic quant typically achieves 1.256 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).
Quantization | Perplexity |
---|---|
q2 | 41.293 |
q3 | 1.900 |
q3.824 | 1.256 |
q3.985 | 1.243 |
q4 | 1.168 |
q5 | 1.141 |
q6 | 1.128 |
q8 | 1.128 |
Usage Notes
- Runs on a single M3 Ultra 512GB RAM using Inferencer app
- Does not require expanding VRAM limit
- However expanding it will allow you to use larger context windows:
sudo sysctl iogpu.wired_limit_mb=507000
- Expect ~20 tokens/s
- Quantized with a modified version of MLX 0.26
- For more details see demonstration video or visit Kimi K2.
- Downloads last month
- 96
Model tree for inferencerlabs/Kimi-K2-Instruct-0905-MLX-3.824bit
Base model
moonshotai/Kimi-K2-Instruct-0905