Update README.md
Browse files
README.md
CHANGED
@@ -13,9 +13,7 @@ library_name: transformers
|
|
13 |
tags:
|
14 |
- autoround
|
15 |
- auto-round
|
16 |
-
- autogptq
|
17 |
- gptq
|
18 |
-
- auto-gptq
|
19 |
- woq
|
20 |
- meta
|
21 |
- pytorch
|
@@ -39,7 +37,9 @@ Quantized version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/m
|
|
39 |
- 8 bits (INT8)
|
40 |
- group size = 128
|
41 |
- Asymmetrical Quantization
|
42 |
-
- Method
|
|
|
|
|
43 |
|
44 |
Quantization framework: [Intel AutoRound](https://github.com/intel/auto-round)
|
45 |
|
@@ -75,8 +75,8 @@ pip install -vvv --no-build-isolation -e .[cpu]
|
|
75 |
bits, group_size, sym, device, amp = 8, 128, False, 'cpu', False
|
76 |
autoround = AutoRound(model, tokenizer, nsamples=128, iters=200, seqlen=512, batch_size=4, bits=bits, group_size=group_size, sym=sym, device=device, amp=amp)
|
77 |
autoround.quantize()
|
78 |
-
output_dir = "./AutoRound/meta-llama_Llama-3.2-1B-Instruct-
|
79 |
-
autoround.save_quantized(output_dir, format='
|
80 |
```
|
81 |
|
82 |
## License
|
|
|
13 |
tags:
|
14 |
- autoround
|
15 |
- auto-round
|
|
|
16 |
- gptq
|
|
|
17 |
- woq
|
18 |
- meta
|
19 |
- pytorch
|
|
|
37 |
- 8 bits (INT8)
|
38 |
- group size = 128
|
39 |
- Asymmetrical Quantization
|
40 |
+
- Method WoQ (AutoRound format)
|
41 |
+
|
42 |
+
Fast and low memory, 2-3X speedup (slight accuracy drop at W4G128)
|
43 |
|
44 |
Quantization framework: [Intel AutoRound](https://github.com/intel/auto-round)
|
45 |
|
|
|
75 |
bits, group_size, sym, device, amp = 8, 128, False, 'cpu', False
|
76 |
autoround = AutoRound(model, tokenizer, nsamples=128, iters=200, seqlen=512, batch_size=4, bits=bits, group_size=group_size, sym=sym, device=device, amp=amp)
|
77 |
autoround.quantize()
|
78 |
+
output_dir = "./AutoRound/meta-llama_Llama-3.2-1B-Instruct-auto_round-int8-gs128-asym"
|
79 |
+
autoround.save_quantized(output_dir, format='auto_round', inplace=True)
|
80 |
```
|
81 |
|
82 |
## License
|