alakxender
/

dhivehi-nougat-base

@@ -1,17 +1,19 @@
 ---
 base_model: facebook/nougat-base
 library_name: transformers
 tags:
 - generated_from_trainer
 model-index:
 - name: dhivehi-nougat-base
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# dhivehi-nougat-base
 This model is a fine-tuned version of [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) on the None dataset.
 It achieves the following results on the evaluation set:
@@ -19,15 +21,52 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -144,4 +183,4 @@ The following hyperparameters were used during training:
 - Transformers 4.47.0
 - Pytorch 2.6.0+cu124
 - Datasets 3.2.0
-- Tokenizers 0.21.0

 ---
 base_model: facebook/nougat-base
 library_name: transformers
+license: cc-by-4.0
 tags:
 - generated_from_trainer
 model-index:
 - name: dhivehi-nougat-base
   results: []
+datasets:
+- alakxender/dhivehi-image-text
+language:
+- dv
 ---
+# DHIVEHI NOUGAT BASE (IMAGE-TO-TEXT)
 This model is a fine-tuned version of [facebook/nougat-base](https://huggingface.co/facebook/nougat-base) on the None dataset.
 It achieves the following results on the evaluation set:
 ## Model description
+Finetuned dhivehi on text-image dataset, config all
+## Usage
+```python
+from PIL import Image
+import torch
+from transformers import NougatProcessor, VisionEncoderDecoderModel
+from pathlib import Path
+# Load the model and processor
+processor = NougatProcessor.from_pretrained("alakxender/dhivehi-nougat-base")
+model = VisionEncoderDecoderModel.from_pretrained(
+    "alakxender/dhivehi-nougat-base",
+    torch_dtype=torch.bfloat16,                 # Optional: Load the model with BF16 data type for faster inference and lower memory usage
+    attn_implementation={                       # Optional: Specify the attention kernel implementations for different parts of the model
+        "decoder": "flash_attention_2",         # Use FlashAttention-2 for the decoder for improved performance
+        "encoder": "eager"                      # Use the default ("eager") attention implementation for the encoder
+    }
+)
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model.to(device)
+context_length = 128
+def predict(img_path):
+    # Ensure image is in RGB format
+    image = Image.open(img_path).convert("RGB")
+    pixel_values = processor(image, return_tensors="pt").pixel_values.to(torch.bfloat16)
+    # generate prediction
+    outputs = model.generate(
+        pixel_values.to(device),
+        min_length=1,
+        max_new_tokens=context_length,
+        repetition_penalty=1.5,
+        bad_words_ids=[[processor.tokenizer.unk_token_id]],
+        eos_token_id=processor.tokenizer.eos_token_id,
+    )
+    page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
+    return page_sequence
+print(predict("DV01-04_31.jpg"))
+```
 ## Training procedure
 - Transformers 4.47.0
 - Pytorch 2.6.0+cu124
 - Datasets 3.2.0
+- Tokenizers 0.21.0