mt-ja-mr-small100

This is a Japanese-to-Marathi neural machine translation model trained from scratch based on a reduced-parameter version of alirezamsh/small100, which itself is a compact variant of the original M2M100 model by Facebook AI.

The model was trained on the MultiCCAligned dataset, consisting of approximately 210,000 Japanese-Marathi sentence pairs for training and 51,000 for evaluation.

Model description

This model is a distilled version of the multilingual M2M100 architecture, originally proposed by Facebook AI for direct translation between 100 languages without relying on English as an intermediate. The small100 version reduces parameters to allow for faster training and lower resource usage while retaining multilingual capabilities.

Base model: alirejash/small100 (based on M2M100)
Original architecture: Facebook's M2M100
Tokenizer: M2M100Tokenizer (SentencePiece-based)
Parameter count: 97.1M (down from 333M)
Mixed Precision: Trained using FP16
Max GPU VRAM usage: ~5.2 GB during training

Intended uses & limitations

Intended Use

Translation of general-domain text from Japanese to Marathi.

Limitations

Dataset noise: The MultiCCAligned dataset was used without any cleaning. It contains English and possibly other language interference.
Biases: Some biases from noisy multilingual content in the training data may have been learned.
Underfitting: Training loss plateaued around 10,000 steps, suggesting early convergence and possible underfitting.
No formal evaluation: Although 51,000 evaluation examples were available, no BLEU or other metrics were reported due to stagnant loss curves.

Training and evaluation data

Dataset: MultiCCAligned (ja-mr)
Training samples: ~210,000
Evaluation samples: ~51,000
The dataset was used without filtering or preprocessing, which may introduce noise.

Training procedure

Hyperparameters

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 64
optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
lr_scheduler_type: linear
num_epochs: 3
seed: 42
mixed_precision_training: Native AMP (fp16)