Excellent work.

by Nexesenex - opened Jul 7, 2023

Jul 7, 2023

•

edited Jul 7, 2023

Brandon's model customization that you converted in GGML doesn't have the success it deserves. It's performing better than SuperHot 8k, and it's now my main model, in GPTQ and GGML, for Silly Tavern RP scenarios.
Could you make a GGML quantization in Q3_KS and share it with us? I don't have an iGPU, and thus, I lack half a gigabyte of VRAM to use the full context in Q3_KM when the 63 layers are active in KoboldCPP and without having to use "lowvram" parameter.
Thanks for your work in any case !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment