---
Here is a detailed model card for your fine-tuned TrOCR model for the Nepali language: null
license: apache-2.0
language:
- ne
metrics:
- wer
- cer
base_model:
- microsoft/trocr-base-handwritten
pipeline_tag: image-text-to-text
library_name: transformers
tags:
- trocr
- nepali
- ocr
- handwritten-text
- vision
- text-recognition
---
# **TrOCR Fine-Tuned for Nepali Language**

## Model Description

This model is a fine-tuned version of [Microsoft's TrOCR model](https://huggingface.co/microsoft/trocr-base-handwritten) for optical character recognition (OCR) tasks, specifically trained to recognize and generate Nepali text from handwritten or printed image inputs. It leverages a VisionEncoderDecoder architecture with a DeiT-based encoder and a BERT-based decoder.

## Model Architecture

- **Encoder**: Vision Transformer (DeiT)
- **Decoder**: BERT-like architecture adapted for OCR tasks
- **Pretrained Base**: [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten)
- **Tokenizer**: Nepali BERT tokenizer from [Shushant/nepaliBERT](https://huggingface.co/Shushant/nepaliBERT)

## Training Details

- **Dataset**: Fine-tuned using a Nepali dataset consisting of handwritten and printed text.
- **Objective**: Generate accurate Nepali text outputs from images containing textual content.
- **Optimization**: Trained with a combination of beam search and length penalty to enhance the quality of text generation.
- **Beam Search Parameters**:
  - `num_beams = 8`
  - `length_penalty = 2.0`
  - `max_length = 47`
  - `no_repeat_ngram_size = 3`

## Usage

### Inference Example

To use this model for OCR tasks, you can follow the steps below:

```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

# Load the fine-tuned model and processor
model = VisionEncoderDecoderModel.from_pretrained("rockerritesh/trOCR_ne")
processor = TrOCRProcessor.from_pretrained("rockerritesh/trOCR_ne")

# Load an image
image = Image.open("path_to_image.jpg").convert("RGB")

# Preprocess image and generate predictions
pixel_values = processor(images=image, return_tensors="pt").pixel_values
output_ids = model.generate(pixel_values, num_beams=8, max_length=47, early_stopping=True)
decoded_text = processor.tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]

print("Recognized Text:", decoded_text)
```

### Hugging Face Hub

You can access the model and its processor on the Hugging Face Hub:

- **Model**: [rockerritesh/trOCR_ne](https://huggingface.co/rockerritesh/trOCR_ne)
- **Processor**: [rockerritesh/trOCR_ne](https://huggingface.co/rockerritesh/trOCR_ne)

### Features

- **OCR for Nepali**: Trained to accurately recognize Nepali text in handwritten and printed formats.
- **Robust Tokenizer**: Utilizes the Nepali BERT tokenizer for efficient tokenization.
- **Efficient Inference**: Supports beam search and length penalties to optimize generation quality.

## Fine-Tuning Details

### Hyperparameters

| Hyperparameter       | Value  |
|----------------------|--------|
| Batch Size           | 16     |
| Learning Rate        | 5e-5   |
| Epochs               | 5      |
| Optimizer            | AdamW  |
| Beam Search Beams    | 8      |
| Max Length           | 47     |
| Length Penalty       | 2.0    |
| No Repeat N-Gram Size| 3      |

### Model Configuration

The model was configured as follows:

#### Decoder
- Activation Function: ReLU
- Attention Heads: 8
- Layers: 6
- Hidden Size: 256
- FFN Size: 1024

#### Encoder
- Hidden Size: 384
- Layers: 12
- Attention Heads: 6
- Image Size: 384

### Dataset Details

The dataset used for fine-tuning consists of diverse handwritten and printed Nepali text from publicly available and custom datasets.

## Limitations and Bias

- The model's performance depends on the quality and diversity of the fine-tuning dataset.
- It may not generalize well to unseen handwriting styles or printed text with unconventional fonts.

## Citation

If you use this model in your research or applications, please cite:

```plaintext
@article{rockerritesh-trocr-nepali,
  title={Fine-Tuned TrOCR Model for Nepali Language},
  author={Sumit Yadav},
  year={2024},
  url={https://huggingface.co/rockerritesh/trOCR_ne}
}
```

## License

license: apache-2.0