BabyChou commited on
Commit
eeb5be0
·
verified ·
1 Parent(s): dc6c541

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md CHANGED
@@ -14,6 +14,43 @@ Llama-3.2-SFT-Vision-Arena is a chat assistant trained by fine-tuning Llama-3.2-
14
  - Repository: https://github.com/lm-sys/FastChat
15
  - Paper: https://arxiv.org/abs/2412.08687
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ### Uses
18
  The primary use of Llama-3.2-SFT-Vision-Arena is research on vision language models and chatbots. The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
19
 
 
14
  - Repository: https://github.com/lm-sys/FastChat
15
  - Paper: https://arxiv.org/abs/2412.08687
16
 
17
+ ### Sample Inference Code
18
+ ```
19
+ import requests
20
+ import torch
21
+ from PIL import Image
22
+ from transformers import MllamaForConditionalGeneration, AutoProcessor
23
+
24
+ model_id = "lmarena-ai/llama-3.2-sft-vision-arena"
25
+
26
+ model = MllamaForConditionalGeneration.from_pretrained(
27
+ model_id,
28
+ torch_dtype=torch.bfloat16,
29
+ device_map="auto",
30
+ )
31
+ processor = AutoProcessor.from_pretrained(model_id)
32
+
33
+ url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
34
+ image = Image.open(requests.get(url, stream=True).raw)
35
+
36
+ messages = [
37
+ {"role": "user", "content": [
38
+ {"type": "image"},
39
+ {"type": "text", "text": "Write a haiku about this image: "}
40
+ ]}
41
+ ]
42
+ input_text = processor.tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
43
+ inputs = processor(
44
+ image,
45
+ input_text,
46
+ add_special_tokens=False,
47
+ return_tensors="pt"
48
+ ).to(model.device)
49
+
50
+ output = model.generate(**inputs, max_new_tokens=30)
51
+ print(processor.decode(output[0]))
52
+ ```
53
+
54
  ### Uses
55
  The primary use of Llama-3.2-SFT-Vision-Arena is research on vision language models and chatbots. The primary intended users of the model are researchers and hobbyists in natural language processing, machine learning, and artificial intelligence.
56