IQ1 Quant speed issues

#7
by DefaultDF - opened

My friend try to run IQ3_XSS version of this on CPU (with partical offload to GPU) and gets 6 t/s. But for fun we try to run IQ1_S for better speed and i get 2 t/s... Why? I had enough VRAM+RAM, but get worth performance

shimmyshimmer changed discussion title from IQ1 Quant poor performance to IQ1 Quant speed issues

not enough information to help you. for each case:

  • versions of all things used
  • specs
  • OS
  • inference engine
  • all non default parameters used for the model(temp etc.) and for running it(context size, swa etc.)
  • how many layers on GPU/CPU
  • observations different from using other models in the same manner

http://catb.org/~esr/faqs/smart-questions.html might be worth a cursory read

Sign up or log in to comment