HuatuoGPT-Vision-34B-hf

Introduction

This is the Huggingface LLaVA version of HuatuoGPT-Vision-34B, compatible with VLLM and other frameworks. You can access the original model here: HuatuoGPT-Vision-34B.

Quick Start

1. Deploy the model using VLLM

python -m vllm.entrypoints.openai.api_server \
--model huatuogpt_vision_model_path  \
--tensor_parallel_size 2 \
--gpu_memory_utilization 0.8 \
--served-model-name huatuogpt_vision_34b \
--chat-template "{%- if messages[0]['role'] == 'system' -%}\n    {%- set system_message = messages[0]['content'] -%}\n    {%- set messages = messages[1:] -%}\n{%- else -%}\n    {% set system_message = '' -%}\n{%- endif -%}\n\n{%- for message in messages -%}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}\n        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}\n    {%- endif -%}\n\n    {%- if message['role'] == 'user' -%}\n        {{ '<|user|>\n' + message['content'] + '\n' }}\n    {%- elif message['role'] == 'assistant' -%}\n        {{ '<|assistant|>\n' + message['content'] + '\n' }}\n    {%- endif -%}\n{%- endfor -%}\n\n{%- if add_generation_prompt -%}\n    {{ '<|assistant|>' }}\n{% endif %}" \
--port 9559 --max-model-len 2048 > vllm_openai_server.log 2>&1 &

2. Model inference

from openai import OpenAI
from PIL import Image
import base64
import io

def get_image(image_path):
    image = Image.open(image_path).convert('RGB')
    img_type = image.format
    if not img_type:
        img_type = image_path.split('.')[-1]
    byte_arr = io.BytesIO()
    image.save(byte_arr, format=img_type)
    byte_arr.seek(0)
    image = base64.b64encode(byte_arr.getvalue()).decode()
    return image, img_type


client = OpenAI(
    base_url="http://localhost:9559/v1",
    api_key="token-abc123"
)
image_path = 'your_image_path'
image, img_type = get_image(image_path)


inputcontent = [{
    "type": "text",
    "text": '<image>\nWhat does the picture show?'
}]

inputcontent.append({
    "type": "image_url",
    "image_url": {
        "url": f"data:image/{img_type};base64,{image}"
    }
})

response = client.chat.completions.create(
    model="huatuogpt_vision_34b",
    messages=[
        {"role": "user", "content": inputcontent}
    ],
    temperature=0.2
)
print(response.choices[0].message.content)

Citation

@misc{chen2024huatuogptvisioninjectingmedicalvisual,
      title={HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale}, 
      author={Junying Chen and Ruyi Ouyang and Anningzhe Gao and Shunian Chen and Guiming Hardy Chen and Xidong Wang and Ruifei Zhang and Zhenyang Cai and Ke Ji and Guangjun Yu and Xiang Wan and Benyou Wang},
      year={2024},
      eprint={2406.19280},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2406.19280}, 
}

Downloads last month: 35

Safetensors

Model size

35B params

Tensor type

BF16

Dataset used to train FreedomIntelligence/HuatuoGPT-Vision-34B-hf

Collection including FreedomIntelligence/HuatuoGPT-Vision-34B-hf

HuatuoGPT-Vision

Collection

Medical Multimodal LLMs • 5 items • Updated May 25, 2025 • 4

Paper for FreedomIntelligence/HuatuoGPT-Vision-34B-hf

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

Paper • 2406.19280 • Published Jun 27, 2024 • 63