Model Card for mbart-large-50-mmt-ko-vi
This model is fine-tuned from mBART-large-50 using multilingual translation data of Korean legal documents for Korean-to-Vietnamese translation tasks.
Table of Contents
- Model Card for mbart-large-50-mmt-ko-vi
- Table of Contents
- Model Details
- Uses
- Bias, Risks, and Limitations
- Training Details
- Evaluation
- Environmental Impact
- Technical Specifications
- Citation
- Model Card Contact
Model Details
Model Description
- Developed by: Jaeyoon Myoung, Heewon Kwak
- Shared by: ofu
- Model type: Language model (Translation)
- Language(s) (NLP): Korean, Vietnamese
- License: Apache 2.0
- Parent Model: facebook/mbart-large-50-many-to-many-mmt
Uses
Direct Use
This model is used for text translation from Korean to Vietnamese.
Out-of-Scope Use
This model is not suitable for translation tasks involving languages other than Korean.
Bias, Risks, and Limitations
The model may contain biases inherited from the training data and may produce inappropriate translations for sensitive topics.
Training Details
Training Data
The model was trained using multilingual translation data of Korean legal documents provided by AI Hub.
Training Procedure
Preprocessing
- Removed unnecessary whitespace, special characters, and line breaks.
Speeds, Sizes, Times
- Training Time: 1 hour 25 minutes (5,100 seconds) on Nvidia RTX 4090
- Throughput: ~3.51 samples/second
- Total Training Samples: 17,922
- Model Checkpoint Size: Approximately 2.3GB
- Gradient Accumulation Steps: 4
- FP16 Mixed Precision Enabled: Yes
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate:
0.0001
- train_batch_size:
8
(per device) - eval_batch_size:
8
(per device) - seed:
42
- distributed_type:
single-node
(since_n_gpu=1
and no distributed training setup is indicated) - num_devices:
1
(single NVIDIA GPU: RTX 4090) - gradient_accumulation_steps:
4
- total_train_batch_size:
32
(calculated astrain_batch_size * gradient_accumulation_steps
) - total_eval_batch_size:
8
(evaluation does not use gradient accumulation) - optimizer:
AdamW
(indicated byoptim=OptimizerNames.ADAMW_TORCH
) - lr_scheduler_type:
linear
(indicated bylr_scheduler_type=SchedulerType.LINEAR
) - lr_scheduler_warmup_steps:
100
- num_epochs:
3
Evaluation
Testing Data
The evaluation used a dataset partially extracted from Korean labor law precedents.
Metrics
- BLEU
Results
- BLEU Score: 29.69
- Accuracy: 95.65%
Environmental Impact
- Hardware Type: NVIDIA RTX 4090
- Power Consumption: ~450W
- Training Time: 1 hour 25 minutes (1.42 hours)
- Electricity Consumption: ~0.639 kWh
- Carbon Emission Factor (South Korea): 0.459 kgCO₂/kWh
- Estimated Carbon Emissions: ~0.293 kgCO₂
Technical Specifications
Model Architecture: Based on mBART-large-50, a multilingual sequence-to-sequence transformer model designed for translation tasks. The architecture includes 24 encoder and 24 decoder layers with 1,024 hidden units.
Software:
- sacrebleu for evaluation
- Hugging Face Transformers library for fine-tuning
- Python 3.11.9 and PyTorch 2.4.0
Hardware: NVIDIA RTX 4090 with 24GB VRAM was used for training and inference.
Tokenization and Preprocessing: The tokenization was performed using the SentencePiece model pre-trained with mBART-large-50. Text preprocessing included removing special characters, unnecessary whitespace, and normalizing line breaks.
Citation
Currently, there are no papers or blog posts available for this model.
Model Card Contact
- Contact Email: [email protected] | [email protected]
- Downloads last month
- 45
Model tree for ofu-ai/mbart-large-50-mmt-ko-vi
Base model
facebook/mbart-large-50-many-to-many-mmt