R136a1's picture
Update README.md
ed66c38
|
raw
history blame
No virus
1.93 kB

EXL2 Quantization of Gryphe's MythoMax L2 13B.

Other quantized models are available from TheBloke: GGML - GPTQ - GGUF - AWQ

Model details

Branch Bits Perplexity Desc
main 5 6.1018 Up to 6144 context size on T4 GPU
6bit 6 6.1182 4096 context size (tokens) on T4 GPU
3bit 3 6.3666 Low bits quant while still good
4bit 4 6.1601 Slightly better than 4bit GPTQ, ez 8K context on T4 GPU
- 7 6.1056 2048 max context size for T4 GPU
- 8 6.1027 Just, why?

I'll upload the 7 and 8 bits quant if someone request it. (Idk y the 5 bits quant preplexity is lower than higher bits quant, I think I did something wrong?)

Prompt Format

Alpaca format:

### Instruction:





### Response: