MT5 Dhivehi (gatitos__en_dv fine-tuned)

This model is a fine-tuned version of google/mt5-small on the Google smol gatitos__en_dv dataset.

⚠️ This is not a general-purpose translator. This finetune is to test MT5 usage on dhivehi. It is not intended for any other use.

Model Summary

  • Base model: google/mt5-small
  • Task: Translation (English → Dhivehi)
  • Domain: Unknown, This is an experimental finetune, so try words or short phrases only.
  • Dataset: google/smolgatitos__en_dv
  • Training framework: Hugging Face Transformers
  • Loss target: ~0.01

Training Details

Parameter Value
Epochs 90
Batch size 4
Learning rate 5e-5 (constant)
Final train loss 0.3797
Gradient norm (last) 15.72
Total steps 89,460
Samples/sec ~14.24
FLOPs 2.36e+16
  • Training time: ~6.98 hours (25,117 seconds)
  • Optimizer: AdamW
  • Scheduler: Constant (no decay)
  • Logging: Weights & Biases

Example Usage (Gradio)

from transformers import MT5ForConditionalGeneration, T5Tokenizer

model = MT5ForConditionalGeneration.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
tokenizer = T5Tokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel")

text = "translate English to Dhivehi: Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_length=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Intended Use

This model is meant for:

  • Research in low-resource translation
  • Experimentation with Dhivehi-language modeling
  • Exprementation on the tokenizer
Downloads last month
5
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alakxender/mt5-dhivehi-word-parallel

Base model

google/mt5-base
Finetuned
(218)
this model

Dataset used to train alakxender/mt5-dhivehi-word-parallel