|
--- |
|
pipeline_tag: any-to-any |
|
datasets: |
|
- openbmb/RLAIF-V-Dataset |
|
library_name: transformers |
|
language: |
|
- multilingual |
|
tags: |
|
- minicpm-o |
|
- omni |
|
- vision |
|
- ocr |
|
- multi-image |
|
- video |
|
- custom_code |
|
- audio |
|
- speech |
|
- voice cloning |
|
- live Streaming |
|
- realtime speech conversation |
|
- asr |
|
- tts |
|
--- |
|
|
|
<h1>A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone</h1> |
|
|
|
## MiniCPM-o 2.6 int4 |
|
This is the int4 quantized version of [**MiniCPM-o 2.6**](https://huggingface.co/openbmb/MiniCPM-o-2_6). |
|
Running with int4 version would use lower GPU memory (about 9GB). |
|
|
|
### Prepare code and install AutoGPTQ |
|
|
|
We are submitting PR to officially support minicpm-o 2.6 inference |
|
|
|
```python |
|
git clone https://github.com/OpenBMB/AutoGPTQ.git && cd AutoGPTQ |
|
git checkout minicpmo |
|
|
|
# install AutoGPTQ |
|
pip install -vvv --no-build-isolation -e . |
|
``` |
|
|
|
### Usage of **MiniCPM-o-2_6-int4** |
|
|
|
Change the model initialization part to `AutoGPTQForCausalLM.from_quantized` |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModel, AutoTokenizer |
|
from auto_gptq import AutoGPTQForCausalLM |
|
|
|
model = AutoGPTQForCausalLM.from_quantized( |
|
'openbmb/MiniCPM-o-2_6-int4', |
|
torch_dtype=torch.bfloat16, |
|
device="cuda:0", |
|
trust_remote_code=True, |
|
disable_exllama=True, |
|
disable_exllamav2=True |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained( |
|
'openbmb/MiniCPM-o-2_6-int4', |
|
trust_remote_code=True |
|
) |
|
|
|
model.init_tts() |
|
|
|
``` |
|
|
|
Usage reference [MiniCPM-o-2_6#usage](https://huggingface.co/openbmb/MiniCPM-o-2_6#usage) |
|
|