File size: 3,149 Bytes

---
library_name: transformers
license: apache-2.0
base_model: Helsinki-NLP/opus-mt-en-fr
tags:
- translation
- generated_from_trainer
datasets:
- kde4
metrics:
- bleu
model-index:
- name: marian-finetuned-kde4-en-to-fr
  results:
  - task:
      name: Sequence-to-sequence Language Modeling
      type: text2text-generation
    dataset:
      name: kde4
      type: kde4
      config: en-fr
      split: train
      args: en-fr
    metrics:
    - name: Bleu
      type: bleu
      value: 50.54449537679619
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Marian Fine-Tuned KDE4 (English-to-French)

This model is a fine-tuned version of [Helsinki-NLP/opus-mt-en-fr](https://huggingface.co/Helsinki-NLP/opus-mt-en-fr) using the KDE4 dataset. It achieves the following results on the evaluation set:
- **Loss**: 0.9620
- **BLEU**: 50.5445

---

## Model Description

This English-to-French translation model has been fine-tuned specifically on the KDE4 dataset. The base model, Helsinki-NLP/opus-mt-en-fr, is part of the MarianMT family, renowned for its efficiency and high-quality neural machine translation capabilities. 

---

## Intended Uses & Limitations

### Intended Uses
- Translating English text into French.
- High-quality translations in the context of software localization, especially related to KDE4.

### Limitations
- Performance may decline on texts outside the KDE4 domain.
- Struggles with idiomatic expressions, specialized technical jargon, or ambiguous content.

---

## Training & Evaluation Data

The model was fine-tuned on the KDE4 dataset, a specialized resource for machine translation in software localization. The evaluation metrics reflect the model's performance on this domain-specific data.

---

## Training Procedure

### Hyperparameters
- **Learning Rate**: 2e-05  
- **Train Batch Size**: 32  
- **Eval Batch Size**: 64  
- **Seed**: 42  
- **Optimizer**: AdamW with `betas=(0.9, 0.999)`, `epsilon=1e-08`  
- **LR Scheduler**: Linear  
- **Epochs**: 1  
- **Mixed Precision Training**: Native AMP  

### Results
- **Loss**: 0.9620  
- **BLEU**: 50.5445  

### Training Loss Progression

| Step  | Training Loss |
|-------|---------------|
| 500   | 1.2253        |
| 1000  | 1.2165        |
| 1500  | 1.1913        |
| 2000  | 1.1404        |
| 2500  | 1.1178        |
| 3000  | 1.0900        |
| 3500  | 1.0594        |
| 4000  | 1.0512        |
| 4500  | 1.0633        |
| 5000  | 1.0405        |
| 5500  | 1.0316        |

---

## Framework Versions
- **Transformers**: 4.47.1  
- **PyTorch**: 2.5.1+cu121  
- **Datasets**: 3.2.0  
- **Tokenizers**: 0.21.0  

---

## Example Usage

```python
from transformers import pipeline

# Load the model
model_checkpoint = "ParitKansal/marian-finetuned-kde4-en-to-fr"
translator = pipeline("translation", model=model_checkpoint)

# Translate text
translation = translator("Default to expanded threads")
print(translation)
```

This script demonstrates how to use the model for English-to-French translation tasks.

---