MT5 Dhivehi (gatitos__en_dv fine-tuned)
This model is a fine-tuned version of google/mt5-small
on the Google smol
gatitos__en_dv
dataset.
⚠️ This is not a general-purpose translator. This finetune is to test MT5 usage on dhivehi. It is not intended for any other use.
Model Summary
- Base model:
google/mt5-small
- Task: Translation (English → Dhivehi)
- Domain: Unknown, This is an experimental finetune, so try words or short phrases only.
- Dataset:
google/smol
→gatitos__en_dv
- Training framework: Hugging Face Transformers
- Loss target: ~0.01
Training Details
Parameter | Value |
---|---|
Epochs | 90 |
Batch size | 4 |
Learning rate | 5e-5 (constant) |
Final train loss | 0.3797 |
Gradient norm (last) | 15.72 |
Total steps | 89,460 |
Samples/sec | ~14.24 |
FLOPs | 2.36e+16 |
- Training time: ~6.98 hours (25,117 seconds)
- Optimizer: AdamW
- Scheduler: Constant (no decay)
- Logging: Weights & Biases
Example Usage (Gradio)
from transformers import MT5ForConditionalGeneration, T5Tokenizer
model = MT5ForConditionalGeneration.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
tokenizer = T5Tokenizer.from_pretrained("alakxender/mt5-dhivehi-word-parallel")
text = "translate English to Dhivehi: Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
output = model.generate(**inputs, max_length=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Intended Use
This model is meant for:
- Research in low-resource translation
- Experimentation with Dhivehi-language modeling
- Exprementation on the tokenizer
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for alakxender/mt5-dhivehi-word-parallel
Base model
google/mt5-base