Empty output

#52
by LeonardoBenitez - opened

I'm using transformers==4.51.2, torch==2.6.0 and python==3.10.12.
The following self-contained code:

import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration

model_id = "llava-hf/llava-1.5-7b-hf"
revision = 'a272c74'
model = LlavaForConditionalGeneration.from_pretrained(
    model_id,
    revision=revision, 
    torch_dtype=torch.float16, 
).to('cuda')
processor = AutoProcessor.from_pretrained(model_id, revision=revision)
processor.patch_size = 14

conversation = [
    {
      "role": "user",
      "content": [
          {"type": "text", "text": "What are these?"},
          {"type": "image"},
        ],
    },
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

returns:

ER:  
What are these? ASSISTANT:

That is, the output/answer is empty.


I have observed the same behavior with the pipeline API:

from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llava-hf/llava-1.5-7b-hf", revision='a272c74')
pipe.processor.patch_size = 14
messages = [
    {
      "role": "user",
      "content": [
          {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"},
          {"type": "text", "text": "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"},
        ],
    },
]

out = pipe(text=messages, max_new_tokens=20)
print(out)

returns:

[{'input_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg'}, {'type': 'text', 'text': 'What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud'}]}], 'generated_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg'}, {'type': 'text', 'text': 'What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud'}]}, {'role': 'assistant', 'content': ''}]}]

I have observed the same behavior with other revisions and other library versions, I can report them if necessary.

Any help is welcome :)

Llava Hugging Face org

@LeonardoBenitez any reason to use revision = 'a272c74'? I believe using the last commit will work as we havent seen any similar issues with LLaVA from other users

Using the latest revision I got exactly the same result: empty.
In the issue description I fixed the revision because in the past I remember running llava successfully, and in some other issue someone recommended this version for another problem.

Llava Hugging Face org

@LeonardoBenitez hmm, I cannot reproduce the error with latest transformers and using no revision indicated. I can suggest to try the same code in Colab if it reproduces, and if it doesn't then there's some issue in your local setup most probably, You can create a new env and install transformers from scratch as a possible fix

Llava Hugging Face org

Yes we also test the llava models here to ensure they keep outputting correct text.

I can not pinpoint what is wrong in my environment, probably something not related to llava.
I was able to run this code in colab (with quantization because colab don't have enough memory): https://colab.research.google.com/drive/1gFyGrDUp_GV6Evr4tgknl2ZCPzUHL_48?usp=sharing

LeonardoBenitez changed discussion status to closed

Sign up or log in to comment