EXL quantization at 4.0bpw?

by lazydog22 - opened Mar 6, 2024

Mar 6, 2024

Is there any way you can quantize this model at 4.0/4.6bpw? I think lzlv is unmatched for roleplay in 70b's but the only thing holding it back is the context length. I have 48gb vram so this quantization is outside what I can run. I think people would be interested if it could be run with two 24gb cards.

grimulkan

Owner Mar 9, 2024

Sorry about the delay, I was dealing with a hardware crash situation. It may take me another week to resolve my issues, but I'll make a 4bit version if no one has done it by then. I was hoping someone else (egs., TheBloke) would do more quants, but since no one has, I'll try to add more quants when I get a chance.

Thanks for pointing it out.

grimulkan

Owner Mar 31, 2024

•

edited Mar 31, 2024

I am slowly clawing my way back to functionality after losing a lot of hardware and data in an electrical outage. Fortunately this lzlv merge was uploaded to HF so I was able to finally get the quant out. Here is the 4-bit.

Is there a better quant that fits in 48GB? 4.6bit?

lazydog22

Apr 16, 2024

Just saw that you were able to make this happen. Thanks so much! I’ll test it out tonight and see how much memory I have left 🫡

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment