Model Card for mbart-large-50-mmt-ko-vi

This model is fine-tuned from mBART-large-50 using multilingual translation data of Korean legal documents for Korean-to-Vietnamese translation tasks.

Model Card for mbart-large-50-mmt-ko-vi
Table of Contents
Model Details
- Model Description
Uses
- Direct Use
- Out-of-Scope Use
Bias, Risks, and Limitations
Training Details
- Training Data
- Training Procedure
  - Preprocessing
  - Speeds, Sizes, Times
Evaluation
Environmental Impact
Technical Specifications
Citation
Model Card Contact

Model Details

Model Description

Developed by: Jaeyoon Myoung, Heewon Kwak
Shared by: ofu
Model type: Language model (Translation)
Language(s) (NLP): Korean, Vietnamese
License: Apache 2.0
Parent Model: facebook/mbart-large-50-many-to-many-mmt

Uses

Direct Use

This model is used for text translation from Korean to Vietnamese.

Out-of-Scope Use

This model is not suitable for translation tasks involving languages other than Korean.

Bias, Risks, and Limitations

The model may contain biases inherited from the training data and may produce inappropriate translations for sensitive topics.

Training Details

Training Data

The model was trained using multilingual translation data of Korean legal documents provided by AI Hub.

Training Procedure

Preprocessing

Removed unnecessary whitespace, special characters, and line breaks.

Speeds, Sizes, Times

Training Time: 1 hour 25 minutes (5,100 seconds) on Nvidia RTX 4090
Throughput: ~3.51 samples/second
Total Training Samples: 17,922
Model Checkpoint Size: Approximately 2.3GB
Gradient Accumulation Steps: 4
FP16 Mixed Precision Enabled: Yes

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8 (per device)
eval_batch_size: 8 (per device)
seed: 42
distributed_type: single-node (since _n_gpu=1 and no distributed training setup is indicated)
num_devices: 1 (single NVIDIA GPU: RTX 4090)
gradient_accumulation_steps: 4
total_train_batch_size: 32 (calculated as train_batch_size * gradient_accumulation_steps)
total_eval_batch_size: 8 (evaluation does not use gradient accumulation)
optimizer: AdamW (indicated by optim=OptimizerNames.ADAMW_TORCH)
lr_scheduler_type: linear (indicated by lr_scheduler_type=SchedulerType.LINEAR)
lr_scheduler_warmup_steps: 100
num_epochs: 3

Evaluation

Testing Data

The evaluation used a dataset partially extracted from Korean labor law precedents.

Metrics

BLEU

Results

BLEU Score: 29.69
Accuracy: 95.65%

Environmental Impact

Hardware Type: NVIDIA RTX 4090
Power Consumption: ~450W
Training Time: 1 hour 25 minutes (1.42 hours)
Electricity Consumption: ~0.639 kWh
Carbon Emission Factor (South Korea): 0.459 kgCO₂/kWh
Estimated Carbon Emissions: ~0.293 kgCO₂

Technical Specifications

Model Architecture: Based on mBART-large-50, a multilingual sequence-to-sequence transformer model designed for translation tasks. The architecture includes 24 encoder and 24 decoder layers with 1,024 hidden units.
Software:
- sacrebleu for evaluation
- Hugging Face Transformers library for fine-tuning
- Python 3.11.9 and PyTorch 2.4.0
Hardware: NVIDIA RTX 4090 with 24GB VRAM was used for training and inference.
Tokenization and Preprocessing: The tokenization was performed using the SentencePiece model pre-trained with mBART-large-50. Text preprocessing included removing special characters, unnecessary whitespace, and normalizing line breaks.

Citation

Currently, there are no papers or blog posts available for this model.

Model Card Contact

Contact Email: [email protected] | [email protected]

ofu-ai
/

mbart-large-50-mmt-ko-vi