IQ1 Quant speed issues
#7
by
DefaultDF
- opened
My friend try to run IQ3_XSS version of this on CPU (with partical offload to GPU) and gets 6 t/s. But for fun we try to run IQ1_S for better speed and i get 2 t/s... Why? I had enough VRAM+RAM, but get worth performance
shimmyshimmer
changed discussion title from
IQ1 Quant poor performance
to IQ1 Quant speed issues
not enough information to help you. for each case:
- versions of all things used
- specs
- OS
- inference engine
- all non default parameters used for the model(temp etc.) and for running it(context size, swa etc.)
- how many layers on GPU/CPU
- observations different from using other models in the same manner
http://catb.org/~esr/faqs/smart-questions.html might be worth a cursory read