Update README.md
9914282
My secret sauce:
- Using comit 3c16fd9 of 0cc4m's GPTQ fork
- Using PTB as the calibration dataset
- Act-order, True-sequential, percdamp 0.1
(the default percdamp is 0.01)
- No groupsize
- Will run with CUDA, does not need triton.
- Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.
Benchmark results
Model |
C4 |
WikiText2 |
PTB |
Aeala's FP16 |
7.05504846572876 |
4.662261962890625 |
24.547462463378906 |
This Quant |
7.326207160949707 |
4.957101345062256 |
24.941526412963867 |
Aeala's Quant here |
7.332120418548584 |
5.016242980957031 |
25.576189041137695 |