MarianMT Fine-tuned on English to Vietnamese (Opus100)
This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-vi
, trained on the English to Vietnamese subset of the Opus-100 dataset.
Model Details
- Base Model: Helsinki-NLP/opus-mt-en-vi
- Dataset: Opus-100 (en-vi subset)
- Task: English to Vietnamese translation
Training
- Environment: Google Colab (GPU)
- Epochs: 1
- Learning Rate: 2e-5
- Optimizer: AdamW
- Loss: Cross-entropy
Evaluation
- Metric: SacreBLEU
- Dataset: Subset of Opus-100 test set
Usage
Example using Hugging Face Transformers:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
def load_model_and_translate(model_path, tokenizer_path, input_text):
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
generated_ids = model.generate(inputs['input_ids'], max_length=512, num_beams=4, early_stopping=True)
translated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
return translated_text
model_path = r"C:\Users\XOX\Downloads\NLP_T"
tokenizer_path = model_path
input_text = "Hello, how are you?"
translated_text = load_model_and_translate(model_path, tokenizer_path, input_text)
print(f"Translated text: {translated_text}")
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for KarimQ45/MarianMT_opus100_en_vi
Base model
Helsinki-NLP/opus-mt-en-vi