Quantized for the older GPTQ before it broke all the models.
Use with
https://github.com/Ph0rk0z/text-generation-webui-testing
https://github.com/Ph0rk0z/GPTQ-Merged
https://github.com/Curlypla/peft-GPTQ
Clone the 2 repos into text-generation-webui-testing/repositories
python cuda_setup.py install inside GPTQ-Merged to compile nvidia kernel.
python server.py --cai-chat --gptq-bits 4 --model opt-6b --autograd
Don't forget to get configs from: https://huggingface.co/facebook/opt-6.7b/tree/main
You only need the json files, don't forget merges.txt
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support