Empty output

#52

by LeonardoBenitez - opened Apr 11

Discussion

LeonardoBenitez

Apr 11

•

edited Apr 11

I'm using transformers==4.51.2, torch==2.6.0 and python==3.10.12.
The following self-contained code:

import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration

model_id = "llava-hf/llava-1.5-7b-hf"
revision = 'a272c74'
model = LlavaForConditionalGeneration.from_pretrained(
    model_id,
    revision=revision, 
    torch_dtype=torch.float16, 
).to('cuda')
processor = AutoProcessor.from_pretrained(model_id, revision=revision)
processor.patch_size = 14

conversation = [
    {
      "role": "user",
      "content": [
          {"type": "text", "text": "What are these?"},
          {"type": "image"},
        ],
    },
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)

image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(0, torch.float16)

output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

returns:

ER:  
What are these? ASSISTANT:

That is, the output/answer is empty.

I have observed the same behavior with the pipeline API:

from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llava-hf/llava-1.5-7b-hf", revision='a272c74')
pipe.processor.patch_size = 14
messages = [
    {
      "role": "user",
      "content": [
          {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"},
          {"type": "text", "text": "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"},
        ],
    },
]

out = pipe(text=messages, max_new_tokens=20)
print(out)

returns:

[{'input_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg'}, {'type': 'text', 'text': 'What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud'}]}], 'generated_text': [{'role': 'user', 'content': [{'type': 'image', 'url': 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg'}, {'type': 'text', 'text': 'What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud'}]}, {'role': 'assistant', 'content': ''}]}]

I have observed the same behavior with other revisions and other library versions, I can report them if necessary.

Any help is welcome :)

RaushanTurganbay

Llava Hugging Face org Apr 18

@LeonardoBenitez any reason to use revision = 'a272c74'? I believe using the last commit will work as we havent seen any similar issues with LLaVA from other users

LeonardoBenitez

Apr 23

Using the latest revision I got exactly the same result: empty.
In the issue description I fixed the revision because in the past I remember running llava successfully, and in some other issue someone recommended this version for another problem.

RaushanTurganbay

Llava Hugging Face org Apr 29

@LeonardoBenitez hmm, I cannot reproduce the error with latest transformers and using no revision indicated. I can suggest to try the same code in Colab if it reproduces, and if it doesn't then there's some issue in your local setup most probably, You can create a new env and install transformers from scratch as a possible fix

nielsr

Llava Hugging Face org Apr 29

Yes we also test the llava models here to ensure they keep outputting correct text.

LeonardoBenitez

Apr 30

I can not pinpoint what is wrong in my environment, probably something not related to llava.
I was able to run this code in colab (with quantization because colab don't have enough memory): https://colab.research.google.com/drive/1gFyGrDUp_GV6Evr4tgknl2ZCPzUHL_48?usp=sharing

LeonardoBenitez changed discussion status to closed Apr 30

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment