CURRENTLY UPLOADING Notice will be removed once complete

See Kimi-K2-Instruct-0905 Dynamic MLX in action - https://youtu.be/Ia-q3Ll4tAY

q3.824bit dynamic quant typically achieves 1.256 perplexity in our testing, slotting closer to q4 perplexity (1.168) than q3 perplexity (1.900).

Quantization Perplexity
q2 41.293
q3 1.900
q3.824 1.256
q3.985 1.243
q4 1.168
q5 1.141
q6 1.128
q8 1.128

Usage Notes

  • Runs on a single M3 Ultra 512GB RAM using Inferencer app
  • Does not require expanding VRAM limit
    • However expanding it will allow you to use larger context windows:
    • sudo sysctl iogpu.wired_limit_mb=507000
  • Expect ~20 tokens/s
  • Quantized with a modified version of MLX 0.26
  • For more details see demonstration video or visit Kimi K2.
Downloads last month
96
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for inferencerlabs/Kimi-K2-Instruct-0905-MLX-3.824bit

Quantized
(8)
this model