EXL quantization at 4.0bpw?
Is there any way you can quantize this model at 4.0/4.6bpw? I think lzlv is unmatched for roleplay in 70b's but the only thing holding it back is the context length. I have 48gb vram so this quantization is outside what I can run. I think people would be interested if it could be run with two 24gb cards.
Sorry about the delay, I was dealing with a hardware crash situation. It may take me another week to resolve my issues, but I'll make a 4bit version if no one has done it by then. I was hoping someone else (egs., TheBloke) would do more quants, but since no one has, I'll try to add more quants when I get a chance.
Thanks for pointing it out.
I am slowly clawing my way back to functionality after losing a lot of hardware and data in an electrical outage. Fortunately this lzlv merge was uploaded to HF so I was able to finally get the quant out. Here is the 4-bit.
Is there a better quant that fits in 48GB? 4.6bit?
Just saw that you were able to make this happen. Thanks so much! I’ll test it out tonight and see how much memory I have left 🫡