gguf support request
thank you @J22 . That's great!
chatllm.cpp supports this now:
Hey, I see you're working fast. Could you please give us some tutorial for Windows users to start an API server we could connect to remotely? I'm usually using LM Studio which takes care of lots of that technical stuff for me, but I wouldn't mind getting my hands a bit dirty for models that are not yet available in GGUF format but are already supported by chatllm. I downloaded that pre-compiled Windows version and I'd like to run the OpenAI compatible local server if possible.
@MrDevolver There is an example: scripts/openai_api.py.
@MrDevolver There is an example: scripts/openai_api.py.
Okay, that's a script itself, but that alone doesn't help much. I did find something little in docs/binding.md, but it looks like the script expects two models being loaded at once, one for chatting, one for completions? Umm, you know usually when I run the server in LM Studio it just serves a single model and it works fine. Also, it'd be nice if it was possible to run it from the pre-compiled binary instead of the script.
@MrDevolver Thanks for your feedback. The doc is out dated. I just updated it.
Here is how to serve this model through OpenAI API:
python openai_api.py ---chat :bailing -ngl all
Btw, pre-compiled binaries does not support this model yet. you need to build it yourself.