CTranslate2 NLLB-200 Translation Example
This guide demonstrates how to use a CTranslate2-quantized version of the NLLB-200 model for bilingual translation between Portuguese (por_Latn
) and a target language (vmw_Latn
).
Prerequisites
- Install required packages:
pip install ctranslate2 sentencepiece
# Download model to local folder
git lfs install # Install Git LFS if not already present
git clone https://huggingface.co/felerminoali/ct2_nllb200_pt_vmw_bilingual_int8_ver1
cd ct2_nllb200_pt_vmw_bilingual_int8_ver1
git lfs pull # Download large files (LFS-tracked)
Inference
import os
import ctranslate2
import sentencepiece as spm
model_name ="ct2_nllb200_pt_vmw_bilingual_int8_ver1"
model_name_hf = f"felerminoali/{model_name}"
local_dir = f"./{model_name}"
src_lang="por_Latn"
tgt_lang="vmw_Latn"
sentence="Olá mundo das língua!"
print(f"Model downloaded to {local_dir}")
# [Modify] Set paths to the CTranslate2 and SentencePiece models
ct_model_path = os.path.join(f'./{model_name}')
model_load_name = os.path.join(f'./{model_name}')
sp_model_path = os.path.join(os.path.join(f'./{model_name}'),"sentencepiece.bpe.model")
#device = "cuda" # or "cpu"
device = "cpu"
beam_size = 4
# Load the source SentecePiece model
sp = spm.SentencePieceProcessor()
sp.load(sp_model_path)
translator = ctranslate2.Translator(ct_model_path, device)
source_sents = [sentence]
target_prefix = [[tgt_lang]] * len(source_sents)
# Subword the source sentences
source_sents_subworded = sp.encode(source_sents, out_type=str)
source_sents_subworded = [[src_lang] + sent + ["</s>"] for sent in source_sents_subworded]
# Translate the source sentences
translator = ctranslate2.Translator(ct_model_path, device=device)
translations = translator.translate_batch(source_sents_subworded, batch_type="tokens", max_batch_size=2024, beam_size=beam_size, target_prefix=target_prefix)
translations = [translation.hypotheses[0] for translation in translations]
# Desubword the target sentences
translations_desubword = sp.decode(translations)
translations_desubword = [sent[len(tgt_lang):] for sent in translations_desubword]
print("Translations:", *translations_desubword, sep="\n")
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for felerminoali/ct2_nllb200_pt_vmw_bilingual_int8_ver1
Base model
facebook/nllb-200-distilled-600M