Gibberish output on 4bpw and multiGPU system

#2
by Panchovix - opened

Hello there, thanks for the quants!

I have been trying the 4bpw one but I get just gibberish and/or nonsense output.

I'm running it on Fedora, 208GB VRAM alongside 7 GPUs, exllamav3 dev and tabbyapi.

Works with the small edit in this PR: https://github.com/turboderp-org/exllamav3/pull/68

The issue and the fix ended up being completely different from what I proposed, but the dev branch should be fixed now.

MikeRoz changed discussion status to closed

Sign up or log in to comment