MarianMT Fine-tuned on English to Vietnamese (Opus100)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-vi, trained on the English to Vietnamese subset of the Opus-100 dataset.

Model Details

  • Base Model: Helsinki-NLP/opus-mt-en-vi
  • Dataset: Opus-100 (en-vi subset)
  • Task: English to Vietnamese translation

Training

  • Environment: Google Colab (GPU)
  • Epochs: 1
  • Learning Rate: 2e-5
  • Optimizer: AdamW
  • Loss: Cross-entropy

Evaluation

  • Metric: SacreBLEU
  • Dataset: Subset of Opus-100 test set

Usage

Example using Hugging Face Transformers:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch

def load_model_and_translate(model_path, tokenizer_path, input_text):
    model = AutoModelForSeq2SeqLM.from_pretrained(model_path)
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
    inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
    with torch.no_grad():
        generated_ids = model.generate(inputs['input_ids'], max_length=512, num_beams=4, early_stopping=True)
    translated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
    return translated_text

model_path = r"C:\Users\XOX\Downloads\NLP_T"
tokenizer_path = model_path
input_text = "Hello, how are you?"
translated_text = load_model_and_translate(model_path, tokenizer_path, input_text)
print(f"Translated text: {translated_text}")
Downloads last month
7
Safetensors
Model size
71.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KarimQ45/MarianMT_opus100_en_vi

Finetuned
(14)
this model

Dataset used to train KarimQ45/MarianMT_opus100_en_vi