askmyteapot's picture
Update README.md
b89fd23
|
raw
history blame
820 Bytes

This is a 4bit quant of https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b

My secret sauce:

  • Using comit 3c16fd9 of 0cc4m's GPTQ fork
  • Using PTB as the calibration dataset
  • Act-order, True-sequential, perfdamp 0.1 (the default perfdamp is 0.01)
  • No groupsize
  • Will run with CUDA, does not need triton.
  • Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.

Benchmark results

Model C4 WikiText2 PTB
This Quant 7.326207160949707 4.957101345062256 24.941526412963867
Aela's Quant here x.xxxxxx x.xxxxxx x.xxxxxx