alakxender commited on
Commit
fce59a5
·
verified ·
1 Parent(s): 19e604d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -14
README.md CHANGED
@@ -1,17 +1,19 @@
1
  ---
2
  base_model: facebook/nougat-base
3
  library_name: transformers
 
4
  tags:
5
  - generated_from_trainer
6
  model-index:
7
  - name: dhivehi-nougat-base
8
  results: []
 
 
 
 
9
  ---
10
 
11
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
- should probably proofread and complete it, then remove this comment. -->
13
-
14
- # dhivehi-nougat-base
15
 
16
  This model is a fine-tuned version of [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) on the None dataset.
17
  It achieves the following results on the evaluation set:
@@ -19,15 +21,52 @@ It achieves the following results on the evaluation set:
19
 
20
  ## Model description
21
 
22
- More information needed
23
-
24
- ## Intended uses & limitations
25
-
26
- More information needed
27
-
28
- ## Training and evaluation data
29
-
30
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Training procedure
33
 
@@ -144,4 +183,4 @@ The following hyperparameters were used during training:
144
  - Transformers 4.47.0
145
  - Pytorch 2.6.0+cu124
146
  - Datasets 3.2.0
147
- - Tokenizers 0.21.0
 
1
  ---
2
  base_model: facebook/nougat-base
3
  library_name: transformers
4
+ license: cc-by-4.0
5
  tags:
6
  - generated_from_trainer
7
  model-index:
8
  - name: dhivehi-nougat-base
9
  results: []
10
+ datasets:
11
+ - alakxender/dhivehi-image-text
12
+ language:
13
+ - dv
14
  ---
15
 
16
+ # DHIVEHI NOUGAT BASE (IMAGE-TO-TEXT)
 
 
 
17
 
18
  This model is a fine-tuned version of [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) on the None dataset.
19
  It achieves the following results on the evaluation set:
 
21
 
22
  ## Model description
23
 
24
+ Finetuned dhivehi on text-image dataset, config all
25
+
26
+ ## Usage
27
+
28
+ ```python
29
+ from PIL import Image
30
+ import torch
31
+ from transformers import NougatProcessor, VisionEncoderDecoderModel
32
+ from pathlib import Path
33
+
34
+ # Load the model and processor
35
+ processor = NougatProcessor.from_pretrained("alakxender/dhivehi-nougat-base")
36
+ model = VisionEncoderDecoderModel.from_pretrained(
37
+ "alakxender/dhivehi-nougat-base",
38
+ torch_dtype=torch.bfloat16, # Optional: Load the model with BF16 data type for faster inference and lower memory usage
39
+ attn_implementation={ # Optional: Specify the attention kernel implementations for different parts of the model
40
+ "decoder": "flash_attention_2", # Use FlashAttention-2 for the decoder for improved performance
41
+ "encoder": "eager" # Use the default ("eager") attention implementation for the encoder
42
+ }
43
+ )
44
+
45
+ device = "cuda" if torch.cuda.is_available() else "cpu"
46
+ model.to(device)
47
+
48
+ context_length = 128
49
+
50
+ def predict(img_path):
51
+ # Ensure image is in RGB format
52
+ image = Image.open(img_path).convert("RGB")
53
+ pixel_values = processor(image, return_tensors="pt").pixel_values.to(torch.bfloat16)
54
+
55
+ # generate prediction
56
+ outputs = model.generate(
57
+ pixel_values.to(device),
58
+ min_length=1,
59
+ max_new_tokens=context_length,
60
+ repetition_penalty=1.5,
61
+ bad_words_ids=[[processor.tokenizer.unk_token_id]],
62
+ eos_token_id=processor.tokenizer.eos_token_id,
63
+ )
64
+
65
+ page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
66
+ return page_sequence
67
+
68
+ print(predict("DV01-04_31.jpg"))
69
+ ```
70
 
71
  ## Training procedure
72
 
 
183
  - Transformers 4.47.0
184
  - Pytorch 2.6.0+cu124
185
  - Datasets 3.2.0
186
+ - Tokenizers 0.21.0