mt-ja-mr-small100
This is a Japanese-to-Marathi neural machine translation model trained from scratch based on a reduced-parameter version of alirezamsh/small100, which itself is a compact variant of the original M2M100 model by Facebook AI.
The model was trained on the MultiCCAligned dataset, consisting of approximately 210,000 Japanese-Marathi sentence pairs for training and 51,000 for evaluation.
Model description
This model is a distilled version of the multilingual M2M100 architecture, originally proposed by Facebook AI for direct translation between 100 languages without relying on English as an intermediate. The small100
version reduces parameters to allow for faster training and lower resource usage while retaining multilingual capabilities.
- Base model: alirejash/small100 (based on M2M100)
- Original architecture: Facebook's M2M100
- Tokenizer: M2M100Tokenizer (SentencePiece-based)
- Parameter count: 97.1M (down from 333M)
- Mixed Precision: Trained using FP16
- Max GPU VRAM usage: ~5.2 GB during training
Intended uses & limitations
Intended Use
- Translation of general-domain text from Japanese to Marathi.
Limitations
- Dataset noise: The MultiCCAligned dataset was used without any cleaning. It contains English and possibly other language interference.
- Biases: Some biases from noisy multilingual content in the training data may have been learned.
- Underfitting: Training loss plateaued around 10,000 steps, suggesting early convergence and possible underfitting.
- No formal evaluation: Although 51,000 evaluation examples were available, no BLEU or other metrics were reported due to stagnant loss curves.
Training and evaluation data
- Dataset: MultiCCAligned (ja-mr)
- Training samples: ~210,000
- Evaluation samples: ~51,000
- The dataset was used without filtering or preprocessing, which may introduce noise.
Training procedure
Hyperparameters
learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 64
optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
lr_scheduler_type: linear
num_epochs: 3
seed: 42
mixed_precision_training: Native AMP (fp16)
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support