SmartQuant v1 of Llama-3.3-70B-Instruct in just 2.39 bpw.

With just 19.60GB it compares to those two:

  • Llama-3.3-70B-Instruct-IQ2_XS.gguf IQ2_XS 21.14GB false Low quality, uses SOTA techniques to be usable.
  • Llama-3.3-70B-Instruct-IQ2_XXS.gguf IQ2_XXS 19.10GB false Very low quality, uses SOTA techniques to be usable.

I'll do some qualification and perplexity runs next.

Downloads last month
12
GGUF
Model size
8.17B params
Architecture
granite
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support