inferencerlabs/Kimi-K2-Instruct-0905-MLX-3.824bit

CURRENTLY UPLOADING Notice will be removed once complete

See Kimi-K2-Instruct-0905 Dynamic MLX in action - https://youtu.be/Ia-q3Ll4tAY

q3.824bit dynamic quant typically achieves 1.256 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).

Usage Notes

Runs on a single M3 Ultra 512GB RAM using Inferencer app
Does not require expanding VRAM limit
- However expanding it will allow you to use larger context windows:
- sudo sysctl iogpu.wired_limit_mb=507000
Expect ~20 tokens/s
Quantized with a modified version of MLX 0.26
For more details see demonstration video or visit Kimi K2.