[EXL2](https://github.com/turboderp/exllamav2/tree/master#exllamav2) Quantization of [Gryphe's MythoMax L2 13B](https://huggingface.co/Gryphe/MythoMax-L2-13b). Other quantized models are available from TheBloke: [GGML](https://huggingface.co/TheBloke/MythoMax-L2-13B-GGML) - [GPTQ](https://huggingface.co/TheBloke/MythoMax-L2-13B-GPTQ) - [GGUF](https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF) - [AWQ](https://huggingface.co/TheBloke/MythoMax-L2-13B-AWQ) ## Model details | Branch | Bits | Perplexity | Desc | |----------------------------------------------------------------------|------|------------|---------------------------------------------------------| | [main](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/main) | 5 | 6.1018 | Up to 6144 context size on T4 GPU | | [6bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/6bit) | 6 | 6.1182 | 4096 context size (tokens) on T4 GPU | | [3bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/3bit) | 3 | 6.3666 | Low bits quant while still good | | [4bit](https://huggingface.co/R136a1/MythoMax-L2-13B-exl2/tree/4bit) | 4 | 6.1601 | Slightly better than 4bit GPTQ, ez 8K context on T4 GPU | | - | 7 | 6.1056 | 2048 max context size for T4 GPU | | - | 8 | 6.1027 | Just, why? | I'll upload the 7 and 8 bits quant if someone request it. (Idk y the 5 bits quant preplexity is lower than higher bits quant, I think I did something wrong?) ## Prompt Format Alpaca format: ``` ### Instruction: ### Response: ```