Inoichan commited on
Commit
e80998b
·
1 Parent(s): 5e6cf51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -1
README.md CHANGED
@@ -4,4 +4,103 @@ datasets:
4
  - MMInstruction/M3IT
5
  language:
6
  - en
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - MMInstruction/M3IT
5
  language:
6
  - en
7
+ ---
8
+
9
+ # GIT-LLM: Generative Image to text Transformer with Large Language Models
10
+
11
+ <img src="./images/rainbow_goose.png" width="50%" height="50%">
12
+
13
+ Welcome to the GIT-LLM repository. GIT-LLM is an innovative fusion of the GIT Vision and Language model with the linguistic capabilities of the LLM (Language Learning Model). Harnessing the power of both worlds, this model is fine-tuned using the LoRA (Local Re-Attention) method, optimizing it for enhanced performance in diverse vision and language tasks.
14
+
15
+ # Examples
16
+
17
+ <img src="./images/example_result_0.jpg" width="50%" height="50%">
18
+
19
+ <img src="./images/example_result_1.jpg" width="50%" height="50%">
20
+
21
+ <img src="./images/example_result_2.jpg" width="50%" height="50%">
22
+
23
+ # Description of the uploaded weight
24
+ This model was trained for one epoch using M3IT (excluding videos and Chinese tasks). For more details, please refer to our blog (in Japanese).
25
+
26
+ # Installation
27
+ 1. Clone this repository
28
+ ```bash
29
+ git clone https://github.com/Ino-Ichan/GIT-LLM
30
+ cd GIT-LLM
31
+ ```
32
+
33
+ 2. Install Packages
34
+ ```bash
35
+ conda create -n git_llm python=3.10 -y
36
+ conda activate git_llm
37
+ pip install --upgrade pip # enable PEP 660 support
38
+
39
+ pip install -r requirements.txt
40
+ pip install -e .
41
+ ```
42
+
43
+ ## For Llama 2
44
+ First, you request access to the llama-2 models, in [huggingface page](https://huggingface.co/meta-llama/Llama-2-7b) and [facebook website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)
45
+
46
+ Please sign-in the huggingface account
47
+ ```bash
48
+ huggingface-cli login
49
+ ```
50
+
51
+ # Training
52
+
53
+ Now we support LLaMA, MPT, and OPT as a LLM module.
54
+
55
+ ```bash
56
+ ./scripts/run.sh
57
+ ```
58
+
59
+ # Evaluation
60
+
61
+ You can get the pretrained weight form HuggingFace Hub: [Inoichan/GIT-Llama-2-7B](https://huggingface.co/Inoichan/GIT-Llama-2-7B)<br>
62
+ See also [notebooks](./notebooks).
63
+
64
+ ```python
65
+ import requests
66
+ from transformers import AutoTokenizer
67
+ from git_llm.git_llama import GitLlamaForCausalLM
68
+
69
+ device_id = 0
70
+
71
+ # prepare a pretrained model
72
+ model = GitLlamaForCausalLM.from_pretrained('Inoichan/GIT-Llama-2-7B')
73
+ model.eval()
74
+ model.to(f"cuda:{device_id}")
75
+
76
+ # prepare a processor
77
+ processor = AutoProcessor.from_pretrained('Inoichan/GIT-Llama-2-7B')
78
+
79
+ # prepare inputs
80
+ url = "https://www.barnorama.com/wp-content/uploads/2016/12/03-Confusing-Pictures.jpg"
81
+ image = Image.open(requests.get(url, stream=True).raw)
82
+
83
+ text = f"##Instruction: Please answer the following question concletely. ##Question: What is unusual about this image? Explain precisely and concletely what he is doing? ##Answer: "
84
+
85
+ # do preprocessing
86
+ inputs = processor(
87
+ text,
88
+ image,
89
+ return_tensors="pt",
90
+ truncation=True,
91
+ )
92
+ inputs = {k: v.to(f"cuda:{device_id}") for k, v in inputs.items()}
93
+
94
+ # set eos token
95
+ eos_token_id_list = [
96
+ processor.tokenizer.pad_token_id,
97
+ processor.tokenizer.eos_token_id,
98
+ ]
99
+
100
+ # do inference
101
+ with torch.no_grad():
102
+ out = model.generate(**inputs, max_length=256, do_sample=False, temperature=0., eos_token_id=eos_token_id_list)
103
+
104
+ # print result
105
+ print(processor.tokenizer.batch_decode(out))
106
+ ```