Empty output
I'm using transformers==4.51.2, torch==2.6.0 and python==3.10.12.
The following self-contained code:
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
model_id = "llava-hf/llava-1.5-7b-hf"
revision = 'a272c74'
model = LlavaForConditionalGeneration.from_pretrained(
model_id,
revision=revision,
torch_dtype=torch.float16,
).to('cuda')
processor = AutoProcessor.from_pretrained(model_id, revision=revision)
processor.patch_size = 14
conversation = [
{
"role": "user",
"content": [
{"type": "text", "text": "What are these?"},
{"type": "image"},
],
},
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(0, torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))
returns:
ER:
What are these? ASSISTANT:
That is, the output/answer is empty.
I have observed the same behavior with the pipeline API:
from transformers import pipeline
pipe = pipeline("image-text-to-text", model="llava-hf/llava-1.5-7b-hf", revision='a272c74')
pipe.processor.patch_size = 14
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"},
{"type": "text", "text": "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"},
],
},
]
out = pipe(text=messages, max_new_tokens=20)
print(out)
returns:
[{'input_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg'}, {'type': 'text', 'text': 'What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud'}]}], 'generated_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg'}, {'type': 'text', 'text': 'What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud'}]}, {'role': 'assistant', 'content': ''}]}]
I have observed the same behavior with other revisions and other library versions, I can report them if necessary.
Any help is welcome :)
@LeonardoBenitez
any reason to use revision = 'a272c74'
? I believe using the last commit will work as we havent seen any similar issues with LLaVA from other users
Using the latest revision I got exactly the same result: empty.
In the issue description I fixed the revision because in the past I remember running llava successfully, and in some other issue someone recommended this version for another problem.
@LeonardoBenitez hmm, I cannot reproduce the error with latest transformers and using no revision indicated. I can suggest to try the same code in Colab if it reproduces, and if it doesn't then there's some issue in your local setup most probably, You can create a new env and install transformers from scratch as a possible fix
I can not pinpoint what is wrong in my environment, probably something not related to llava.
I was able to run this code in colab (with quantization because colab don't have enough memory): https://colab.research.google.com/drive/1gFyGrDUp_GV6Evr4tgknl2ZCPzUHL_48?usp=sharing