askmyteapot
/

GPT4-X-Alpasta-30b-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

Edit model card

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This is a 4bit quant of https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b

My secret sauce:

Using comit 3c16fd9 of 0cc4m's GPTQ fork
Using C4 as the calibration dataset
Act-order, True-sequential, percdamp 0.1 (the default percdamp is 0.01)
No groupsize
Will run with CUDA, does not need triton.
Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.

Benchmark results

Model	C4	WikiText2	PTB
MetaIX's FP16	6.98400259	4.607768536	9.414786339
This Quant	7.292364597	4.954069614	9.754593849

Downloads last month: 9

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.