yuzaa commited on
Commit
8e7b8c7
·
verified ·
1 Parent(s): caa448b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -27,3 +27,43 @@ tags:
27
  ## MiniCPM-o 2.6 int4
28
  This is the int4 quantized version of [**MiniCPM-o 2.6**](https://huggingface.co/openbmb/MiniCPM-o-2_6).
29
  Running with int4 version would use lower GPU memory (about 9GB).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ## MiniCPM-o 2.6 int4
28
  This is the int4 quantized version of [**MiniCPM-o 2.6**](https://huggingface.co/openbmb/MiniCPM-o-2_6).
29
  Running with int4 version would use lower GPU memory (about 9GB).
30
+
31
+ ### Prepare code and install AutoGPTQ
32
+
33
+ We are submitting PR to officially support minicpm-o 2.6 inference
34
+
35
+ ```python
36
+ git clone https://github.com/YuzaChongyi/AutoGPTQ.git && cd AutoGPTQ
37
+ git checkout minicpmo
38
+
39
+ # install AutoGPTQ
40
+ pip install -vvv --no-build-isolation -e .
41
+ ```
42
+
43
+ ### Usage of **MiniCPM-o-2_6-int4**
44
+
45
+ Change the model initialization part to `AutoGPTQForCausalLM.from_quantized`
46
+
47
+ ```python
48
+ import torch
49
+ from transformers import AutoModel, AutoTokenizer
50
+ from auto_gptq import AutoGPTQForCausalLM
51
+
52
+ model = AutoGPTQForCausalLM.from_quantized(
53
+ 'openbmb/MiniCPM-o-2_6-int4',
54
+ torch_dtype=torch.bfloat16,
55
+ device="cuda:0",
56
+ trust_remote_code=True,
57
+ disable_exllama=True,
58
+ disable_exllamav2=True
59
+ )
60
+ tokenizer = AutoTokenizer.from_pretrained(
61
+ 'openbmb/MiniCPM-o-2_6-int4',
62
+ trust_remote_code=True
63
+ )
64
+
65
+ model.init_tts()
66
+
67
+ ```
68
+
69
+ Usage reference [MiniCPM-o-2_6#usage](https://huggingface.co/openbmb/MiniCPM-o-2_6#usage)