--- Here is a detailed model card for your fine-tuned TrOCR model for the Nepali language: null license: apache-2.0 language: - ne metrics: - wer - cer base_model: - microsoft/trocr-base-handwritten pipeline_tag: image-text-to-text library_name: transformers tags: - trocr - nepali - ocr - handwritten-text - vision - text-recognition --- # **TrOCR Fine-Tuned for Nepali Language** ## Model Description This model is a fine-tuned version of [Microsoft's TrOCR model](https://huggingface.co/microsoft/trocr-base-handwritten) for optical character recognition (OCR) tasks, specifically trained to recognize and generate Nepali text from handwritten or printed image inputs. It leverages a VisionEncoderDecoder architecture with a DeiT-based encoder and a BERT-based decoder. ## Model Architecture - **Encoder**: Vision Transformer (DeiT) - **Decoder**: BERT-like architecture adapted for OCR tasks - **Pretrained Base**: [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) - **Tokenizer**: Nepali BERT tokenizer from [Shushant/nepaliBERT](https://huggingface.co/Shushant/nepaliBERT) ## Training Details - **Dataset**: Fine-tuned using a Nepali dataset consisting of handwritten and printed text. - **Objective**: Generate accurate Nepali text outputs from images containing textual content. - **Optimization**: Trained with a combination of beam search and length penalty to enhance the quality of text generation. - **Beam Search Parameters**: - `num_beams = 8` - `length_penalty = 2.0` - `max_length = 47` - `no_repeat_ngram_size = 3` ## Usage ### Inference Example To use this model for OCR tasks, you can follow the steps below: ```python from transformers import TrOCRProcessor, VisionEncoderDecoderModel from PIL import Image # Load the fine-tuned model and processor model = VisionEncoderDecoderModel.from_pretrained("rockerritesh/trOCR_ne") processor = TrOCRProcessor.from_pretrained("rockerritesh/trOCR_ne") # Load an image image = Image.open("path_to_image.jpg").convert("RGB") # Preprocess image and generate predictions pixel_values = processor(images=image, return_tensors="pt").pixel_values output_ids = model.generate(pixel_values, num_beams=8, max_length=47, early_stopping=True) decoded_text = processor.tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0] print("Recognized Text:", decoded_text) ``` ### Hugging Face Hub You can access the model and its processor on the Hugging Face Hub: - **Model**: [rockerritesh/trOCR_ne](https://huggingface.co/rockerritesh/trOCR_ne) - **Processor**: [rockerritesh/trOCR_ne](https://huggingface.co/rockerritesh/trOCR_ne) ### Features - **OCR for Nepali**: Trained to accurately recognize Nepali text in handwritten and printed formats. - **Robust Tokenizer**: Utilizes the Nepali BERT tokenizer for efficient tokenization. - **Efficient Inference**: Supports beam search and length penalties to optimize generation quality. ## Fine-Tuning Details ### Hyperparameters | Hyperparameter | Value | |----------------------|--------| | Batch Size | 16 | | Learning Rate | 5e-5 | | Epochs | 5 | | Optimizer | AdamW | | Beam Search Beams | 8 | | Max Length | 47 | | Length Penalty | 2.0 | | No Repeat N-Gram Size| 3 | ### Model Configuration The model was configured as follows: #### Decoder - Activation Function: ReLU - Attention Heads: 8 - Layers: 6 - Hidden Size: 256 - FFN Size: 1024 #### Encoder - Hidden Size: 384 - Layers: 12 - Attention Heads: 6 - Image Size: 384 ### Dataset Details The dataset used for fine-tuning consists of diverse handwritten and printed Nepali text from publicly available and custom datasets. ## Limitations and Bias - The model's performance depends on the quality and diversity of the fine-tuning dataset. - It may not generalize well to unseen handwriting styles or printed text with unconventional fonts. ## Citation If you use this model in your research or applications, please cite: ```plaintext @article{rockerritesh-trocr-nepali, title={Fine-Tuned TrOCR Model for Nepali Language}, author={Sumit Yadav}, year={2024}, url={https://huggingface.co/rockerritesh/trOCR_ne} } ``` ## License license: apache-2.0