askmyteapot
/

GPT4-x-AlpacaDente2-30b-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

GPT4-x-AlpacaDente2-30b-4bit / README.md

askmyteapot's picture

Update README.md

9914282 over 1 year ago

|

918 Bytes

This is a 4bit quant of https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b

My secret sauce:

Using comit 3c16fd9 of 0cc4m's GPTQ fork
Using PTB as the calibration dataset
Act-order, True-sequential, percdamp 0.1 (the default percdamp is 0.01)
No groupsize
Will run with CUDA, does not need triton.
Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.

Benchmark results

Model	C4	WikiText2	PTB
Aeala's FP16	7.05504846572876	4.662261962890625	24.547462463378906
This Quant	7.326207160949707	4.957101345062256	24.941526412963867
Aeala's Quant here	7.332120418548584	5.016242980957031	25.576189041137695