GPTQ-quantized Gemma3 models
AI & ML interests
None defined yet.
Recent Activity
View all activity
Models prequantized with [HIGGS](https://arxiv.org/abs/2411.17525) zero-shot quantization. Requires the latest `transformers` to run.
-
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Paper • 2411.17525 • Published • 5 -
ISTA-DASLab/Llama-3.3-70B-Instruct-HIGGS-GPTQ-4bit
19B • Updated • 62 • 7 -
ISTA-DASLab/Llama-3.1-8B-Instruct-HIGGS-GPTQ-4bit
Text Generation • 3B • Updated • 11 -
ISTA-DASLab/Llama-3.1-8B-Instruct-HIGGS-GPTQ-3bit
Text Generation • 2B • Updated • 13
AQLM quantized LLMs
-
Extreme Compression of Large Language Models via Additive Quantization
Paper • 2401.06118 • Published • 13 -
ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16
Text Generation • 11B • Updated • 76 • 20 -
ISTA-DASLab/Meta-Llama-3-70B-AQLM-2Bit-1x16
Text Generation • 11B • Updated • 28 • 14 -
ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16
Text Generation • 2B • Updated • 1.86k • 12
https://arxiv.org/abs/2502.05003
Official AQLM quantizations for "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression": https://arxiv.org/abs/2405.14852
-
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Paper • 2405.14852 • Published • 2 -
ISTA-DASLab/Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16
Text Generation • 11B • Updated • 30 • 46 -
ISTA-DASLab/Mistral-Nemo-Instruct-2407-AQLM-PV-2Bit-1x16-hf
3B • Updated • 9 • 3 -
ISTA-DASLab/Meta-Llama-3.1-8B-Instruct-AQLM-PV-2Bit-1x16-hf
Text Generation • 2B • Updated • 1.73k • 8
GPTQ-quantized Gemma3 models
https://arxiv.org/abs/2502.05003
Models prequantized with [HIGGS](https://arxiv.org/abs/2411.17525) zero-shot quantization. Requires the latest `transformers` to run.
-
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
Paper • 2411.17525 • Published • 5 -
ISTA-DASLab/Llama-3.3-70B-Instruct-HIGGS-GPTQ-4bit
19B • Updated • 62 • 7 -
ISTA-DASLab/Llama-3.1-8B-Instruct-HIGGS-GPTQ-4bit
Text Generation • 3B • Updated • 11 -
ISTA-DASLab/Llama-3.1-8B-Instruct-HIGGS-GPTQ-3bit
Text Generation • 2B • Updated • 13
Official AQLM quantizations for "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression": https://arxiv.org/abs/2405.14852
-
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression
Paper • 2405.14852 • Published • 2 -
ISTA-DASLab/Meta-Llama-3.1-70B-Instruct-AQLM-PV-2Bit-1x16
Text Generation • 11B • Updated • 30 • 46 -
ISTA-DASLab/Mistral-Nemo-Instruct-2407-AQLM-PV-2Bit-1x16-hf
3B • Updated • 9 • 3 -
ISTA-DASLab/Meta-Llama-3.1-8B-Instruct-AQLM-PV-2Bit-1x16-hf
Text Generation • 2B • Updated • 1.73k • 8
AQLM quantized LLMs
-
Extreme Compression of Large Language Models via Additive Quantization
Paper • 2401.06118 • Published • 13 -
ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16
Text Generation • 11B • Updated • 76 • 20 -
ISTA-DASLab/Meta-Llama-3-70B-AQLM-2Bit-1x16
Text Generation • 11B • Updated • 28 • 14 -
ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16
Text Generation • 2B • Updated • 1.86k • 12