merve HF staff commited on
Commit
6fff23a
1 Parent(s): f8ca96a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -37
README.md CHANGED
@@ -1,57 +1,71 @@
1
  ---
2
- base_model: HuggingFaceM4/Idefics3-8B-Llama3
3
- library_name: peft
4
  license: apache-2.0
5
- tags:
6
- - generated_from_trainer
7
- model-index:
8
- - name: idefics3-llama-vqav2
9
- results: []
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
- # idefics3-llama-vqav2
16
 
17
- This model is a fine-tuned version of [HuggingFaceM4/Idefics3-8B-Llama3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) on an unknown dataset.
18
 
19
- ## Model description
20
 
21
- More information needed
22
 
23
- ## Intended uses & limitations
24
 
25
- More information needed
26
 
27
- ## Training and evaluation data
28
 
29
- More information needed
30
 
31
- ## Training procedure
32
 
33
- ### Training hyperparameters
 
 
 
 
34
 
35
- The following hyperparameters were used during training:
36
- - learning_rate: 0.0001
37
- - train_batch_size: 4
38
- - eval_batch_size: 8
39
- - seed: 42
40
- - gradient_accumulation_steps: 8
41
- - total_train_batch_size: 32
42
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
43
- - lr_scheduler_type: linear
44
- - lr_scheduler_warmup_steps: 50
45
- - num_epochs: 1
46
 
47
- ### Training results
48
 
 
 
 
 
49
 
 
50
 
51
- ### Framework versions
52
 
53
- - PEFT 0.12.0
54
- - Transformers 4.45.0.dev0
55
- - Pytorch 2.3.1+cu121
56
- - Datasets 2.19.1
57
- - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: transformers
 
3
  license: apache-2.0
4
+ datasets:
5
+ - merve/vqav2-small
 
 
 
6
  ---
7
 
 
 
8
 
 
9
 
10
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6141a88b3a0ec78603c9e784/PebmPLcCig5BlpUS99VUc.png)
11
 
12
+ # Idefics3Llama Fine-tuned using QLoRA on VQAv2
13
 
14
+ - This is the [Idefics3Llama](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) model QLoRA fine-tuned on a very small part of [VQAv2](https://huggingface.co/datasets/merve/vqav2-small) dataset.
15
 
16
+ - Find the fine-tuning notebook [here](https://github.com/merveenoyan/smol-vision/blob/main/Idefics_FT.ipynb).
17
 
18
+ ## Usage
19
 
20
+ You can load and use this model as follows.
21
 
22
+ ```python
23
 
24
+ from transformers import Idefics3ForConditionalGeneration, AutoProcessor
25
 
26
+ peft_model_id = "merve/idefics3llama-vqav2"
27
+ base_model_id = "HuggingFaceM4/Idefics3-8B-Llama3"
28
+ processor = AutoProcessor.from_pretrained(base_model_id)
29
+ model = Idefics3ForConditionalGeneration.from_pretrained(base_model_id)
30
+ model.load_adapter(peft_model_id).to("cuda")
31
 
32
+ ```
 
 
 
 
 
 
 
 
 
 
33
 
34
+ This model was conditioned on a prompt "Answer briefly.".
35
 
36
+ ```python
37
+ from PIL import Image
38
+ import requests
39
+ from transformers.image_utils import load_image
40
 
41
+ DEVICE = "cuda"
42
 
43
+ image = load_image("https://huggingface.co/spaces/merve/OWLSAM2/resolve/main/buddha.JPG")
44
 
45
+
46
+ messages = [
47
+ {
48
+ "role": "user",
49
+ "content": [
50
+ {"type": "text", "text": "Answer briefly."},
51
+ {"type": "image"},
52
+ {"type": "text", "text": "Which country is this located in?"}
53
+ ]
54
+ }
55
+ ]
56
+
57
+ text = processor.apply_chat_template(messages, add_generation_prompt=True)
58
+ inputs = processor(text=text, images=image, return_tensors="pt", padding=True).to("cuda")
59
+ ```
60
+
61
+ We can infer.
62
+
63
+ ```python
64
+ generated_ids = model.generate(**inputs, max_new_tokens=500)
65
+ generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
66
+ print(generated_texts)
67
+
68
+ ##['User: Answer briefly.<row_1_col_1><row_1_col_2><row_1_col_3><row_1_col_4>\n<row_2_col_1>
69
+ # <row_2_col_2><row_2_col_3><row_2_col_4>\n<row_3_col_1><row_3_col_2><row_3_col_3>
70
+ # <row_3_col_4>\n\n<global-img>Which country is this located in?\nAssistant: thailand\nAssistant: thailand']
71
+ ```