Requirements:

pip install opencv-python
pip install albumentations
pip install accelerate
torch==2.2.1
transformers==4.39.0 # may work with more recent version

Adapted sample script for SRRG

import io
import requests
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer
import tempfile

# step 1: Setup constants
model_name = "StanfordAIMI/CheXagent-2-3b-srrg-impression"
dtype = torch.bfloat16
device = "cuda"

# step 2: Load Processor and Model
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True)
model = model.to(dtype)
model.eval()

# step 3: Download image from URL, save to a local file, and prepare path list
url = "https://huggingface.co/IAMJB/interpret-cxr-impression-baseline/resolve/main/effusions-bibasal.jpg"
resp = requests.get(url)
resp.raise_for_status()

# Use a NamedTemporaryFile so it lives on disk
with tempfile.NamedTemporaryFile(delete=False, suffix=".jpg") as tmpfile:
    tmpfile.write(resp.content)
    local_path = tmpfile.name  # this is a real file path on disk

paths = [local_path]

prompt = "Structured Radiology Report Generation for Impression Section"
# build the multimodal input
query = tokenizer.from_list_format(
    [*([{"image": img} for img in paths]), {"text": prompt}]
)

# format as a chat conversation
conv = [
    {"from": "system", "value": "You are a helpful assistant."},
    {"from": "human", "value": query},
]

# tokenize and generate
input_ids = tokenizer.apply_chat_template(
    conv, add_generation_prompt=True, return_tensors="pt"
)
output = model.generate(
    input_ids.to(device),
    do_sample=False,
    num_beams=1,
    temperature=1.0,
    top_p=1.0,
    use_cache=True,
    max_new_tokens=512,
)[0]

# decode the “impression” text
response = tokenizer.decode(output[input_ids.size(1) : -1])
print(response)

Response:

1. Interval increase in bilateral pleural effusions.
2. Interval increase in bibasilar opacities, which may represent atelectasis or consolidation.
3. Stable position of the right upper extremity peripherally inserted central catheter (PICC line).
Downloads last month
120
Safetensors
Model size
3.14B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collections including StanfordAIMI/CheXagent-2-3b-srrg-impression