metadata
inference: false
license: other
Tim Dettmers' Guanaco 33B GPTQ
These files are GPTQ 4bit model files for Tim Dettmers' Guanaco 33B.
It is the result of quantising to 4bit using GPTQ-for-LLaMa.
Other repositories available
- 4-bit GPTQ models for GPU inference
- 4-bit, 5-bit and 8-bit GGML models for CPU(+GPU) inference
- Original unquantised fp16 model in HF format
How to easily download and use this model in text-generation-webui
Open the text-generation-webui UI as normal.
- Click the Model tab.
- Under Download custom model or LoRA, enter
TheBloke/guanaco-33B-GPTQ
. - Click Download.
- Wait until it says it's finished downloading.
- Click the Refresh icon next to Model in the top left.
- In the Model drop-down: choose the model you just downloaded,
guanaco-33B-GPTQ
. - If you see an error in the bottom right, ignore it - it's temporary.
- Fill out the
GPTQ parameters
on the right:Bits = 4
,Groupsize = None
,model_type = Llama
- Click Save settings for this model in the top right.
- Click Reload the Model in the top right.
- Once it says it's loaded, click the Text Generation tab and enter a prompt!
Provided files
Compatible file - Guanaco-33B-GPTQ-4bit.act-order.safetensors
In the main
branch you will find Guanaco-33B-GPTQ-4bit.act-order.safetensors
This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility.
It was created without groupsize to minimise VRAM requirements, to keep it under 24GB VRAM. It was created with the --act-order
parameter to maximise accuracy.
Guanaco-33B-GPTQ-4bit.act-order.safetensors
- Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
- Works with AutoGPTQ
- Works with text-generation-webui one-click-installers
- Parameters: Groupsize = None. --act-order.
- Command used to create the GPTQ:
python llama.py /workspace/process/TheBloke_guanaco-33B-GGML/HF wikitext2 --wbits 4 --true-sequential --act-order --save_safetensors /workspace/process/TheBloke_guanaco-33B-GGML/gptq/Guanaco-33B-GPTQ-4bit.act-order.safetensors
Original model card
Not provided by original model creator.