File size: 2,818 Bytes
b54b323
bdd7cca
 
 
 
b54b323
 
bdd7cca
b54b323
bdd7cca
b54b323
bdd7cca
 
 
 
b54b323
bdd7cca
 
 
 
 
 
 
 
 
 
 
 
5d415ab
bdd7cca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
language:
- ko
- en
license: cc-by-nc-sa-4.0
library_name: transformers
---
# Llama3-Chat_Vector-kor_llava

I have implemented a Korean LLAVA model referring to the models created by Beomi, who made the Korean Chat Vector LLAVA model, and Toshi456, who made the Japanese Chat Vector LLAVA model.

### Reference Models:
1) beomi/Llama-3-KoEn-8B-xtuner-llava-preview(https://huggingface.co/beomi/Llama-3-KoEn-8B-xtuner-llava-preview)
2) toshi456/chat-vector-llava-v1.5-7b-ja(https://huggingface.co/toshi456/chat-vector-llava-v1.5-7b-ja)
3) [xtuner/llava-llama-3-8b-transformers](https://huggingface.co/xtuner/llava-llama-3-8b-transformers)

---
**Citation**

```bibtex
@misc {Llama3-Chat_Vector-kor_llava,
	author       = { {nebchi} },
	title        = { Llama3-Chat_Vector-kor_llava },
	year         = 2024,
	url          = { https://huggingface.co/nebchi/Llama3-Chat_Vector-kor_llava },
	publisher    = { Hugging Face }
}
```
![Seoul City](https://search.pstatic.net/common/?src=http%3A%2F%2Fimgnews.naver.net%2Fimage%2F5582%2F2018%2F04%2F20%2F0000001323_001_20180420094641826.jpg&type=sc960_832)

### Running the model on GPU
```python
import requests
from PIL import Image

import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration, TextStreamer

model_id = "nebchi/Llama3-Chat_Vector-kor_llava"

model = LlavaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype='auto', 
    device_map='auto',
    revision='a38aac3', 
)

processor = AutoProcessor.from_pretrained(model_id)

tokenizer = processor.tokenizer
terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
streamer = TextStreamer(tokenizer)

prompt = ("<|start_header_id|>user<|end_header_id|>\n\n<image>\n์ด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ ์„ค๋ช…ํ•ด์ฃผ์„ธ์š”.<|eot_id|>"
          "<|start_header_id|>assistant<|end_header_id|>\n\n์ด ์ด๋ฏธ์ง€์—๋Š”")
image_file = "https://search.pstatic.net/common/?src=http%3A%2F%2Fimgnews.naver.net%2Fimage%2F5582%2F2018%2F04%2F20%2F0000001323_001_20180420094641826.jpg&type=sc960_832"

raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)

output = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,  
    eos_token_id=terminators,
    no_repeat_ngram_size=3, 
    temperature=0.7,  
    top_p=0.9,  
    streamer=streamer
)
print(processor.decode(output[0][2:], skip_special_tokens=False))
```

### results
```python
์ด ์ด๋ฏธ์ง€์—๋Š” ๋„์‹œ์˜ ๋ชจ์Šต์ด ์ž˜ ๋ณด์—ฌ์ง‘๋‹ˆ๋‹ค. ๋„์‹œ ๋‚ด๋ถ€์—๋Š” ์—ฌ๋Ÿฌ ๊ฑด๋ฌผ๊ณผ ๊ฑด๋ฌผ๋“ค์ด ์žˆ๊ณ , ๋„์‹œ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๋„๋กœ์™€ ๊ตํ†ต ์‹œ์Šคํ…œ์ด ์ž˜ ๋ฐœ๋‹ฌ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋„์‹œ์˜ ํŠน์ง•์€ ๋†’๊ณ  ๊ด‘๋ฒ”์œ„ํ•œ ๊ฑด๋ฌผ๋“ค๊ณผ ๊ตํ†ต๋ง์„ ๊ฐ–์ถ˜ ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.
```