SmartQuant v1 of Llama-3.3-70B-Instruct in just 2.39 bpw.

With just 19.60GB it compares to those two:

Llama-3.3-70B-Instruct-IQ2_XS.gguf IQ2_XS 21.14GB false Low quality, uses SOTA techniques to be usable.
Llama-3.3-70B-Instruct-IQ2_XXS.gguf IQ2_XXS 19.10GB false Very low quality, uses SOTA techniques to be usable.

I'll do some qualification and perplexity runs next.

GGUF

Model size

521M params

Architecture

falcon-h1

Hardware compatibility

6-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support