Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,12 @@
|
|
1 |
---
|
2 |
inference: false
|
3 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
<!-- header start -->
|
@@ -89,17 +95,17 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
|
89 |
quantize_config=None)
|
90 |
|
91 |
prompt = "Tell me about AI"
|
|
|
92 |
prompt_template=f'''### System:
|
93 |
You are an AI assistant that follows instruction extremely well. Help as much as you can.
|
94 |
|
95 |
### User:
|
96 |
-
prompt
|
97 |
|
98 |
### Input:
|
99 |
-
input
|
100 |
|
101 |
### Response:
|
102 |
-
|
103 |
'''
|
104 |
|
105 |
print("\n\n*** Generate:")
|
@@ -139,7 +145,7 @@ It was created with group_size 128 to increase inference accuracy, but without -
|
|
139 |
|
140 |
* `orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order.safetensors`
|
141 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
142 |
-
* [ExLlama](https://github.com/turboderp/exllama)
|
143 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
144 |
* Works with text-generation-webui, including one-click-installers.
|
145 |
* Parameters: Groupsize = 128. Act Order / desc_act = False.
|
|
|
1 |
---
|
2 |
inference: false
|
3 |
+
license: cc-by-nc-sa-4.0
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
library_name: transformers
|
7 |
+
pipeline_tag: text-generation
|
8 |
+
datasets:
|
9 |
+
- psmathur/orca_minis_uncensored_dataset
|
10 |
---
|
11 |
|
12 |
<!-- header start -->
|
|
|
95 |
quantize_config=None)
|
96 |
|
97 |
prompt = "Tell me about AI"
|
98 |
+
input = ""
|
99 |
prompt_template=f'''### System:
|
100 |
You are an AI assistant that follows instruction extremely well. Help as much as you can.
|
101 |
|
102 |
### User:
|
103 |
+
{prompt}
|
104 |
|
105 |
### Input:
|
106 |
+
{input}
|
107 |
|
108 |
### Response:
|
|
|
109 |
'''
|
110 |
|
111 |
print("\n\n*** Generate:")
|
|
|
145 |
|
146 |
* `orca_mini_v2_13b-GPTQ-4bit-128g.no-act.order.safetensors`
|
147 |
* Works with AutoGPTQ in CUDA or Triton modes.
|
148 |
+
* [ExLlama](https://github.com/turboderp/exllama) supports Llama 4-bit GPTQs, and will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
|
149 |
* Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
|
150 |
* Works with text-generation-webui, including one-click-installers.
|
151 |
* Parameters: Groupsize = 128. Act Order / desc_act = False.
|