NirajRajai
/

dots_table

document-understanding

text-extraction

Model card Files Files and versions

NirajRajai commited on 9 days ago

Commit

3ff07bd

·

verified ·

1 Parent(s): 48a78fb

Add model card

Files changed (1) hide show

README.md +96 -3

README.md CHANGED Viewed

@@ -1,3 +1,96 @@
----
-license: mit
----

+---
+license: apache-2.0
+base_model: DotsOCR
+tags:
+- vision
+- ocr
+- document-understanding
+- text-extraction
+datasets:
+- custom
+language:
+- en
+pipeline_tag: image-to-text
+---
+# dots_table
+This is a fine-tuned version of DotsOCR, optimized for document OCR tasks.
+## Model Details
+- **Base Model**: DotsOCR (1.7B parameters)
+- **Training**: LoRA fine-tuning with rank 48
+- **Task**: Document text extraction and OCR
+- **Input**: Document images
+- **Output**: Extracted text in structured format
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoProcessor
+import torch
+from PIL import Image
+# Load model and processor
+model = AutoModelForCausalLM.from_pretrained(
+    "NirajRajai/dots_table",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+    attn_implementation="flash_attention_2"
+)
+processor = AutoProcessor.from_pretrained(
+    "NirajRajai/dots_table",
+    trust_remote_code=True
+)
+# Process image
+image = Image.open("document.png")
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "image": image},
+            {"type": "text", "text": "Extract the text content from this image."}
+        ]
+    }
+]
+# Generate text
+text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+image_inputs, video_inputs = process_vision_info(messages)
+inputs = processor(
+    text=[text],
+    images=image_inputs,
+    videos=video_inputs,
+    padding=True,
+    return_tensors="pt"
+).to(model.device)
+generated_ids = model.generate(**inputs, max_new_tokens=2048)
+generated_ids_trimmed = [
+    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
+]
+output_text = processor.batch_decode(
+    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
+)[0]
+print(output_text)
+```
+## Training Details
+- **Hardware**: NVIDIA H100 80GB
+- **Training Duration**: 3 epochs
+- **Batch Size**: 2 (with gradient accumulation)
+- **Learning Rate**: 5e-5
+- **Optimizer**: AdamW 8-bit
+## License
+Apache 2.0
+## Citation
+If you use this model, please cite the original DotsOCR paper and this fine-tuned version.