askmyteapot's picture
Update README.md
9914282
|
raw
history blame
918 Bytes

This is a 4bit quant of https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b

My secret sauce:

  • Using comit 3c16fd9 of 0cc4m's GPTQ fork
  • Using PTB as the calibration dataset
  • Act-order, True-sequential, percdamp 0.1 (the default percdamp is 0.01)
  • No groupsize
  • Will run with CUDA, does not need triton.
  • Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.

Benchmark results

Model C4 WikiText2 PTB
Aeala's FP16 7.05504846572876 4.662261962890625 24.547462463378906
This Quant 7.326207160949707 4.957101345062256 24.941526412963867
Aeala's Quant here 7.332120418548584 5.016242980957031 25.576189041137695