import time from threading import Thread import gradio as gr import torch from PIL import Image #from transformers import AutoProcessor, LlavaForConditionalGeneration from transformers import TextIteratorStreamer from transformers import LlavaNextForConditionalGeneration, LlavaNextProcessor from PIL import Image import requests import spaces PLACEHOLDER = """

LLaVA-Llama-3-8B

Llava-Llama-3-8b is a LLaVA model fine-tuned from Meta-Llama-3-8B-Instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner

""" ##################### '''processor = LlavaNextProcessor.from_pretrained("tiiuae/falcon-11B-vlm", tokenizer_class='PreTrainedTokenizerFast') model = LlavaNextForConditionalGeneration.from_pretrained("tiiuae/falcon-11B-vlm", torch_dtype=torch.bfloat16) url = "http://images.cocodataset.org/val2017/000000039769.jpg" cats_image = Image.open(requests.get(url, stream=True).raw) instruction = 'Write a long paragraph about this picture.' prompt = f"""User:\n{instruction} Falcon:""" inputs = processor(prompt, images=cats_image, return_tensors="pt", padding=True).to('cuda:0') model.to('cuda:0') output = model.generate(**inputs, max_new_tokens=256) prompt_length = inputs['input_ids'].shape[1] generated_captions = processor.decode(output[0], skip_special_tokens=True).strip() print(generated_captions) ''' ############################# #model_id = "xtuner/llava-llama-3-8b-v1_1-transformers" model_id = "tiiuae/falcon-11B-vlm" #processor = AutoProcessor.from_pretrained(model_id) processor = LlavaNextProcessor.from_pretrained("tiiuae/falcon-11B-vlm", tokenizer_class='PreTrainedTokenizerFast') model = LlavaNextForConditionalGeneration.from_pretrained("tiiuae/falcon-11B-vlm", torch_dtype=torch.bfloat16, #torch_dtype=torch.float16, low_cpu_mem_usage=True,) #model = LlavaForConditionalGeneration.from_pretrained( # model_id, # torch_dtype=torch.float16, # low_cpu_mem_usage=True, #) model.to("cuda:0") #model.generation_config.eos_token_id = 128009 @spaces.GPU def bot_streaming(message, history): print(message) if message["files"]: # message["files"][-1] is a Dict or just a string if type(message["files"][-1]) == dict: image = message["files"][-1]["path"] else: image = message["files"][-1] else: # if there's no image uploaded for this turn, look for images in the past turns # kept inside tuples, take the last one for hist in history: if type(hist[0]) == tuple: image = hist[0][0] try: if image is None: # Handle the case where image is None gr.Error("You need to upload an image for LLaVA to work.") except NameError: # Handle the case where 'image' is not defined at all gr.Error("You need to upload an image for LLaVA to work.") prompt = f"<|start_header_id|>user<|end_header_id|>\n\n\n{message['text']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" # print(f"prompt: {prompt}") image = Image.open(image) inputs = processor(prompt, image, return_tensors='pt').to(0, torch.float16) streamer = TextIteratorStreamer(processor, **{"skip_special_tokens": False, "skip_prompt": True}) generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=1024, do_sample=False) thread = Thread(target=model.generate, kwargs=generation_kwargs) thread.start() text_prompt = f"<|start_header_id|>user<|end_header_id|>\n\n{message['text']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" # print(f"text_prompt: {text_prompt}") buffer = "" time.sleep(0.5) for new_text in streamer: # find <|eot_id|> and remove it from the new_text if "<|eot_id|>" in new_text: new_text = new_text.split("<|eot_id|>")[0] buffer += new_text # generated_text_without_prompt = buffer[len(text_prompt):] generated_text_without_prompt = buffer # print(generated_text_without_prompt) time.sleep(0.06) # print(f"new_text: {generated_text_without_prompt}") yield generated_text_without_prompt chatbot=gr.Chatbot(placeholder=PLACEHOLDER,scale=1) chat_input = gr.MultimodalTextbox(interactive=True, file_types=["image"], placeholder="Enter message or upload file...", show_label=False) with gr.Blocks(fill_height=True, ) as demo: gr.ChatInterface( fn=bot_streaming, title="FalconVLM", examples=[{"text": "What is on the flower?", "files": ["./bee.jpg"]}, {"text": "How to make this pastry?", "files": ["./baklava.png"]}], description="Try [LLaVA Llama-3-8B](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers). Upload an image and start chatting about it, or simply try one of the examples below. If you don't upload an image, you will receive an error.", stop_btn="Stop Generation", multimodal=True, textbox=chat_input, chatbot=chatbot, ) demo.queue(api_open=False) demo.launch(show_api=False, share=False)