Error during inference with image and text.

#12
by aarbelle - opened

Running into the following error when trying inference with Image+Text

/home/.cache/huggingface/modules/transformers_modules/microsoft/Phi-4-multimodal-instruct/879783f7b23e43c12d1c682e3458f115f3a7718d/modeling_phi4mm.py", line 399, in forward
    assert temp_len == output_imgs[-1].shape[1], f'temp_len: {temp_len}, output_imgs[-1].shape[1]: {output_imgs[-1].shape[1]}'
AssertionError: temp_len: 5409, output_imgs[-1].shape[1]: 5393

It doesn't happen for all images, just some.

Same error with:

import requests
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig

# Define model path
model_path = "microsoft/Phi-4-multimodal-instruct"

# Load model and processor
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    device_map="mps", 
    trust_remote_code=True, 
    _attn_implementation='eager',
).to("mps")

# Load generation config
generation_config = GenerationConfig.from_pretrained(model_path)

# Define prompt structure
user_prompt = '<|user|>'
assistant_prompt = '<|assistant|>'
prompt_suffix = '<|end|>'

# Part 1: Image Processing
print("\n--- IMAGE PROCESSING ---")
image_url = 'https://www.ilankelman.org/stopsigns/australia.jpg'
prompt = f'{user_prompt}<|image_1|>What is shown in this image?{prompt_suffix}{assistant_prompt}'
print(f'>>> Prompt\n{prompt}')

# Download and open image
image = Image.open(requests.get(image_url, stream=True).raw)
inputs = processor(text=prompt, images=image, return_tensors='pt').to('mps')

# Generate response
generate_ids = model.generate(
    **inputs,
    max_new_tokens=256,
    generation_config=generation_config,
)
response = processor.batch_decode(
    generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
print(f'>>> Response\n{response}')

Error:

File "/Users/ericbuehler/.cache/huggingface/modules/transformers_modules/phi4_multimodal/modeling_phi4mm.py", line 399, in forward
    assert temp_len == output_imgs[-1].shape[1], f'temp_len: {temp_len}, output_imgs[-1].shape[1]: {output_imgs[-1].shape[1]}'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: temp_len: 1381, output_imgs[-1].shape[1]: 933```

Sign up or log in to comment