MT5 Dhivehi (gatitos__en_dv fine-tuned)

This model is a fine-tuned version of google/mt5-small on the Google smol gatitos__en_dv dataset.

⚠️ This is not a general-purpose translator. This finetune is to test MT5 usage on dhivehi. It is not intended for any other use.

Model Summary

Base model: google/mt5-small
Task: Translation (English → Dhivehi)
Domain: Unknown, This is an experimental finetune, so try words or short phrases only.
Dataset: google/smol → gatitos__en_dv
Training framework: Hugging Face Transformers
Loss target: ~0.01

Training Details

Parameter	Value
Epochs	90
Batch size	4
Learning rate	5e-5 (constant)
Final train loss	0.3797
Gradient norm (last)	15.72
Total steps	89,460
Samples/sec	~14.24
FLOPs	2.36e+16

Training time: ~6.98 hours (25,117 seconds)
Optimizer: AdamW
Scheduler: Constant (no decay)
Logging: Weights & Biases

Example Usage (Gradio)

from transformers import MT5ForConditionalGeneration, T5Tokenizer

model = MT5ForConditionalGeneration.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
tokenizer = T5Tokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel")

text = "translate English to Dhivehi: Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_length=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Intended Use

This model is meant for:

Research in low-resource translation
Experimentation with Dhivehi-language modeling
Exprementation on the tokenizer

alakxender
/

mt5-dhivehi-word-parallel

MT5 Dhivehi (gatitos__en_dv fine-tuned)

Model Summary

Training Details

Example Usage (Gradio)

Intended Use

Model tree for alakxender/mt5-dhivehi-word-parallel

Dataset used to train alakxender/mt5-dhivehi-word-parallel