NirajRajai commited on
Commit
3ff07bd
·
verified ·
1 Parent(s): 48a78fb

Add model card

Browse files
Files changed (1) hide show
  1. README.md +96 -3
README.md CHANGED
@@ -1,3 +1,96 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: DotsOCR
4
+ tags:
5
+ - vision
6
+ - ocr
7
+ - document-understanding
8
+ - text-extraction
9
+ datasets:
10
+ - custom
11
+ language:
12
+ - en
13
+ pipeline_tag: image-to-text
14
+ ---
15
+
16
+ # dots_table
17
+
18
+ This is a fine-tuned version of DotsOCR, optimized for document OCR tasks.
19
+
20
+ ## Model Details
21
+
22
+ - **Base Model**: DotsOCR (1.7B parameters)
23
+ - **Training**: LoRA fine-tuning with rank 48
24
+ - **Task**: Document text extraction and OCR
25
+ - **Input**: Document images
26
+ - **Output**: Extracted text in structured format
27
+
28
+ ## Usage
29
+
30
+ ```python
31
+ from transformers import AutoModelForCausalLM, AutoProcessor
32
+ import torch
33
+ from PIL import Image
34
+
35
+ # Load model and processor
36
+ model = AutoModelForCausalLM.from_pretrained(
37
+ "NirajRajai/dots_table",
38
+ torch_dtype=torch.bfloat16,
39
+ device_map="auto",
40
+ trust_remote_code=True,
41
+ attn_implementation="flash_attention_2"
42
+ )
43
+ processor = AutoProcessor.from_pretrained(
44
+ "NirajRajai/dots_table",
45
+ trust_remote_code=True
46
+ )
47
+
48
+ # Process image
49
+ image = Image.open("document.png")
50
+ messages = [
51
+ {
52
+ "role": "user",
53
+ "content": [
54
+ {"type": "image", "image": image},
55
+ {"type": "text", "text": "Extract the text content from this image."}
56
+ ]
57
+ }
58
+ ]
59
+
60
+ # Generate text
61
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
62
+ image_inputs, video_inputs = process_vision_info(messages)
63
+ inputs = processor(
64
+ text=[text],
65
+ images=image_inputs,
66
+ videos=video_inputs,
67
+ padding=True,
68
+ return_tensors="pt"
69
+ ).to(model.device)
70
+
71
+ generated_ids = model.generate(**inputs, max_new_tokens=2048)
72
+ generated_ids_trimmed = [
73
+ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
74
+ ]
75
+ output_text = processor.batch_decode(
76
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
77
+ )[0]
78
+
79
+ print(output_text)
80
+ ```
81
+
82
+ ## Training Details
83
+
84
+ - **Hardware**: NVIDIA H100 80GB
85
+ - **Training Duration**: 3 epochs
86
+ - **Batch Size**: 2 (with gradient accumulation)
87
+ - **Learning Rate**: 5e-5
88
+ - **Optimizer**: AdamW 8-bit
89
+
90
+ ## License
91
+
92
+ Apache 2.0
93
+
94
+ ## Citation
95
+
96
+ If you use this model, please cite the original DotsOCR paper and this fine-tuned version.