How to have a continuous conversation

#19
by sucongCJS - opened

thanks for your amazing work!
according to your script, we can only have one input, if i want to ask the model more than one question, what should i do?
if I make more than one input, the answer is completely irrelevant to the image...
here is my experiment,

prompt = "USER: <image>\nwhat is the image about\nASSISTANT:"
raw_image = Image.open("/home/ubuntu/code/textual_inversion/zzz/sea.jpg")
inputs = processor(prompt, raw_image, return_tensors="pt").to(device, torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

the image i provided:
image.png
output (which is normal):
ER:
what is the image about
ASSISTANT: The image features a large body of water with a few boats scattered throughout the scene. The water appears to be calm and serene, with a few sailboats and a yacht visible in the distance. The sky above the water is clear and blue, creating a picturesque view of the ocean. The boats are positioned at various distances from each other, adding depth and interest to the scene.

the second input, which has no image, I want the model to answer the question refer to the image i provided before.

prompt = "USER: is the image positive? can you describe the image again?\nASSISTANT:"
inputs = processor(prompt, return_tensors="pt").to(device, torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))

output (which is irrelevant to the image):
ER: is the image positive? can you describe the image again?
ASSISTANT: The image is a positive image of a human brain. It is a close-up view of the brain, showing its intricate structure and details. The image is in black and white, which adds to the dramatic and artistic nature of the photograph. The brain is the main subject of the image, and it is the focal point of the photograph.

Hi, @sucongCJS

Were you able to get a way to do this by script?

Llava Hugging Face org

In that case, you should append the previous message + image to the prompt, before feeding it back to the model

Thanks @nielsr .

Yes, I just tested this with some conversation loop that just keeps adding USER and ASSISTANT past queries and it worked well.

Hi @ggcristian
Can you share the code for the same? I am not getting how to do that.

Can you share the code for the same? I am not getting how to do that.

Hi @prakashshubham , this is what I did to conduct a multi-round conversation and hope you can find this helpful.

queries = [
   "<image>\nHow many animated characters are there in this image?",
   "Answer with a single number in decimal format. Give no explanations."
]

def generate_response(image):
    chat = []
    for query in queries:
        chat.append({"role": "user", "content": query})
        prompt = text_processor.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
        inputs = processor(prompt, image, return_tensors="pt").to(device)

        with torch.no_grad():
            output = model.generate(**inputs, max_new_tokens = 300)
        output = processor.decode(output[0], skip_special_tokens=True)
        
        input_ids = inputs["input_ids"]
        cutoff = len(text_processor.decode(
                            input_ids[0],
                            skip_special_tokens=True,
                            clean_up_tokenization_spaces=True,
                        ))
        answer = output[cutoff:]
        chat.append({"role": "assistant", "content": answer})
    return answer

Can you share the code for the same? I am not getting how to do that.

Hi @prakashshubham , this is what I did to conduct a multi-round conversation and hope you can find this helpful.

queries = [
   "<image>\nHow many animated characters are there in this image?",
   "Answer with a single number in decimal format. Give no explanations."
]

def generate_response(image):
    chat = []
    for query in queries:
        chat.append({"role": "user", "content": query})
        prompt = text_processor.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
        inputs = processor(prompt, image, return_tensors="pt").to(device)

        with torch.no_grad():
            output = model.generate(**inputs, max_new_tokens = 300)
        output = processor.decode(output[0], skip_special_tokens=True)
        
        input_ids = inputs["input_ids"]
        cutoff = len(text_processor.decode(
                            input_ids[0],
                            skip_special_tokens=True,
                            clean_up_tokenization_spaces=True,
                        ))
        answer = output[cutoff:]
        chat.append({"role": "assistant", "content": answer})
    return answer

I was later able to do it myself. But still, thanks for this.

Sign up or log in to comment