|
--- |
|
license: apache-2.0 |
|
language: |
|
- pt |
|
- vmw |
|
datasets: |
|
- LIACC/Emakhuwa-Portuguese-News-MT |
|
base_model: |
|
- facebook/nllb-200-distilled-600M |
|
pipeline_tag: translation |
|
--- |
|
|
|
# CTranslate2 NLLB-200 Translation Example |
|
|
|
This guide demonstrates how to use NLLB-finetuned model for bilingual translation between Portuguese (`por_Latn`) and a target language (`vmw_Latn`). |
|
|
|
## Prerequisites |
|
|
|
- Install required packages: |
|
```bash |
|
pip install transformers torch |
|
``` |
|
|
|
## Inference |
|
```python |
|
from transformers import AutoModelForSeq2SeqLM, NllbTokenizer, AutoTokenizer |
|
import torch |
|
|
|
src_lang="por_Latn" |
|
tgt_lang="vmw_Latn" |
|
text="Olá mundo das língua!" |
|
|
|
device = "cuda:0" if torch.cuda.is_available() else "cpu" |
|
|
|
model_name="felerminoali/nllb200_pt_vmw_bilingual_ver1" |
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device) |
|
tokenizer = NllbTokenizer.from_pretrained(model_name) |
|
|
|
tokenizer.src_lang = src_lang |
|
tokenizer.tgt_lang = tgt_lang |
|
|
|
inputs = tokenizer( |
|
text, return_tensors='pt', padding=True, truncation=True, |
|
max_length=1024 |
|
) |
|
model.eval() # turn off training mode |
|
result = model.generate( |
|
**inputs.to(model.device), |
|
forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang) |
|
) |
|
|
|
print(tokenizer.batch_decode(result, skip_special_tokens=True)[0]) |
|
|
|
``` |