BaichuanMed-OCR-72B

Model Overview

BaichuanMed-OCR-72B is a model fine-tuned from the Qwen2.5-VL-72B-Instruct with our constructed and curated medical report datasets consists of medical report images and related questions and answers (QAs). It has been specifically adapted to perform Optical Character Recognition (OCR) on medical report images and answer questions based on the extracted content.

Capabilities:

  • Robust OCR Capability: Accurately recognizes complex textual content within medical report images.
  • Structured Markdown Output: Supports outputting extracted information in Markdown format.
  • Accurate Comprehension and Generation: Comprehends user queries and generates relevant, logically consistent answers derived from the OCR-extracted text.

Evaluation

To evaluate the effectiveness of BaichuanMed-OCR-72B on specific medical report data, we conducted a benchmark test on our private dataset (similar in composition to https://huggingface.co/datasets/mrlijun/SMR-R1), comparing its performance against relevant baseline models. The accuracy results are as follows:

Model Acc%
Qwen2-VL-72B-Instruct 62.9%
Qwen2.5-VL-72B-Instruct 83.3%
BaichuanMed-OCR-72B 88.6%

The benchmark results indicate that BaichuanMed-OCR-72B achieves higher accuracy on this dataset, showing strong performance for tasks such as extracting key information and summarizing content from medical reports.

Usage

Usage is the same as Qwen2.5VL, here is an example to use the chat model with transformers and qwen_vl_utils:

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "baichuan-inc/BaichuanMed-OCR-72B", torch_dtype="auto", device_map="auto"
)

# default processer
processor = AutoProcessor.from_pretrained("baichuan-inc/BaichuanMed-OCR-72B")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "your image url or PATH here.",
            },
            {"type": "text", "text": "提取图片中的文字和重要符号,有表格的就用表格输出。如果有水印,只需要识别一次,不要重复输出同样的水印内容。"},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=8192)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Citation

If you find our work helpful, please cite it as follows:

@misc{lijun2025BaichuanMed-OCR-72B,
  author       = {Lijun Liu, Tao Zhang, Tao Zhang, Chong Li, Mingrui Wang, Chenglin Zhu, Mingan Lin, Zenan Zhou, Weipeng Chen},
  title        = {BaichuanMed-OCR-72B: A powerful medical report OCR recognition model},
  year         = {2025}
}
Downloads last month
0
Safetensors
Model size
73.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for baichuan-inc/BaichuanMed-OCR-72B

Unable to build the model tree, the base model loops to the model itself. Learn more.