--- license: apache-2.0 base_model: DotsOCR tags: - vision - ocr - document-understanding - text-extraction datasets: - custom language: - en pipeline_tag: image-to-text --- # dots_table This is a fine-tuned version of DotsOCR, optimized for document OCR tasks. ## Model Details - **Base Model**: DotsOCR (1.7B parameters) - **Training**: LoRA fine-tuning with rank 48 - **Task**: Document text extraction and OCR - **Input**: Document images - **Output**: Extracted text in structured format ## Usage ```python from transformers import AutoModelForCausalLM, AutoProcessor import torch from PIL import Image # Load model and processor model = AutoModelForCausalLM.from_pretrained( "NirajRajai/dots_table", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, attn_implementation="flash_attention_2" ) processor = AutoProcessor.from_pretrained( "NirajRajai/dots_table", trust_remote_code=True ) # Process image image = Image.open("document.png") messages = [ { "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": "Extract the text content from this image."} ] } ] # Generate text text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt" ).to(model.device) generated_ids = model.generate(**inputs, max_new_tokens=2048) generated_ids_trimmed = [ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False )[0] print(output_text) ``` ## Training Details - **Hardware**: NVIDIA H100 80GB - **Training Duration**: 3 epochs - **Batch Size**: 2 (with gradient accumulation) - **Learning Rate**: 5e-5 - **Optimizer**: AdamW 8-bit ## License Apache 2.0 ## Citation If you use this model, please cite the original DotsOCR paper and this fine-tuned version.