Prompt sensitivity problems

#44
by TheBigBlockPC - opened

I use diffusers for my inference and quantize to model to nf4 using bitsandbytes but the model seems to be extremely sensitive to prompt format because short prompts usually result in the model breaking.
My current approach is to use a LLM to enhance the prompts.
Is that sensitivity due to the quantization or is that in the fp16 version too.
I can't really test it due to my lack of RAM and VRAM. I have a 5090 and 64 GB of RAM.

Sign up or log in to comment