Update README.md
Browse files
README.md
CHANGED
@@ -1,17 +1,19 @@
|
|
1 |
---
|
2 |
base_model: facebook/nougat-base
|
3 |
library_name: transformers
|
|
|
4 |
tags:
|
5 |
- generated_from_trainer
|
6 |
model-index:
|
7 |
- name: dhivehi-nougat-base
|
8 |
results: []
|
|
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
-
|
12 |
-
should probably proofread and complete it, then remove this comment. -->
|
13 |
-
|
14 |
-
# dhivehi-nougat-base
|
15 |
|
16 |
This model is a fine-tuned version of [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) on the None dataset.
|
17 |
It achieves the following results on the evaluation set:
|
@@ -19,15 +21,52 @@ It achieves the following results on the evaluation set:
|
|
19 |
|
20 |
## Model description
|
21 |
|
22 |
-
|
23 |
-
|
24 |
-
##
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
## Training procedure
|
33 |
|
@@ -144,4 +183,4 @@ The following hyperparameters were used during training:
|
|
144 |
- Transformers 4.47.0
|
145 |
- Pytorch 2.6.0+cu124
|
146 |
- Datasets 3.2.0
|
147 |
-
- Tokenizers 0.21.0
|
|
|
1 |
---
|
2 |
base_model: facebook/nougat-base
|
3 |
library_name: transformers
|
4 |
+
license: cc-by-4.0
|
5 |
tags:
|
6 |
- generated_from_trainer
|
7 |
model-index:
|
8 |
- name: dhivehi-nougat-base
|
9 |
results: []
|
10 |
+
datasets:
|
11 |
+
- alakxender/dhivehi-image-text
|
12 |
+
language:
|
13 |
+
- dv
|
14 |
---
|
15 |
|
16 |
+
# DHIVEHI NOUGAT BASE (IMAGE-TO-TEXT)
|
|
|
|
|
|
|
17 |
|
18 |
This model is a fine-tuned version of [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) on the None dataset.
|
19 |
It achieves the following results on the evaluation set:
|
|
|
21 |
|
22 |
## Model description
|
23 |
|
24 |
+
Finetuned dhivehi on text-image dataset, config all
|
25 |
+
|
26 |
+
## Usage
|
27 |
+
|
28 |
+
```python
|
29 |
+
from PIL import Image
|
30 |
+
import torch
|
31 |
+
from transformers import NougatProcessor, VisionEncoderDecoderModel
|
32 |
+
from pathlib import Path
|
33 |
+
|
34 |
+
# Load the model and processor
|
35 |
+
processor = NougatProcessor.from_pretrained("alakxender/dhivehi-nougat-base")
|
36 |
+
model = VisionEncoderDecoderModel.from_pretrained(
|
37 |
+
"alakxender/dhivehi-nougat-base",
|
38 |
+
torch_dtype=torch.bfloat16, # Optional: Load the model with BF16 data type for faster inference and lower memory usage
|
39 |
+
attn_implementation={ # Optional: Specify the attention kernel implementations for different parts of the model
|
40 |
+
"decoder": "flash_attention_2", # Use FlashAttention-2 for the decoder for improved performance
|
41 |
+
"encoder": "eager" # Use the default ("eager") attention implementation for the encoder
|
42 |
+
}
|
43 |
+
)
|
44 |
+
|
45 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
46 |
+
model.to(device)
|
47 |
+
|
48 |
+
context_length = 128
|
49 |
+
|
50 |
+
def predict(img_path):
|
51 |
+
# Ensure image is in RGB format
|
52 |
+
image = Image.open(img_path).convert("RGB")
|
53 |
+
pixel_values = processor(image, return_tensors="pt").pixel_values.to(torch.bfloat16)
|
54 |
+
|
55 |
+
# generate prediction
|
56 |
+
outputs = model.generate(
|
57 |
+
pixel_values.to(device),
|
58 |
+
min_length=1,
|
59 |
+
max_new_tokens=context_length,
|
60 |
+
repetition_penalty=1.5,
|
61 |
+
bad_words_ids=[[processor.tokenizer.unk_token_id]],
|
62 |
+
eos_token_id=processor.tokenizer.eos_token_id,
|
63 |
+
)
|
64 |
+
|
65 |
+
page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
|
66 |
+
return page_sequence
|
67 |
+
|
68 |
+
print(predict("DV01-04_31.jpg"))
|
69 |
+
```
|
70 |
|
71 |
## Training procedure
|
72 |
|
|
|
183 |
- Transformers 4.47.0
|
184 |
- Pytorch 2.6.0+cu124
|
185 |
- Datasets 3.2.0
|
186 |
+
- Tokenizers 0.21.0
|