Is it possible to convert this model to gguf?
It will make inference faster and the llm part can be quantized to reduce memory footprint.
· Sign up or log in to comment