Excellent work.
#1
by
Nexesenex
- opened
Brandon's model customization that you converted in GGML doesn't have the success it deserves. It's performing better than SuperHot 8k, and it's now my main model, in GPTQ and GGML, for Silly Tavern RP scenarios.
Could you make a GGML quantization in Q3_KS and share it with us? I don't have an iGPU, and thus, I lack half a gigabyte of VRAM to use the full context in Q3_KM when the 63 layers are active in KoboldCPP and without having to use "lowvram" parameter.
Thanks for your work in any case !