askmyteapot
/

GPT4-x-AlpacaDente2-30b-4bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

GPT4-x-AlpacaDente2-30b-4bit / README.md

askmyteapot's picture

Update README.md

b89fd23 over 1 year ago

|

820 Bytes

This is a 4bit quant of https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b

My secret sauce:

Using comit 3c16fd9 of 0cc4m's GPTQ fork
Using PTB as the calibration dataset
Act-order, True-sequential, perfdamp 0.1 (the default perfdamp is 0.01)
No groupsize
Will run with CUDA, does not need triton.
Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.

Benchmark results

Model	C4	WikiText2	PTB
This Quant	7.326207160949707	4.957101345062256	24.941526412963867
Aela's Quant here	x.xxxxxx	x.xxxxxx	x.xxxxxx