felerminoali commited on
Commit
a18b604
·
verified ·
1 Parent(s): 8e38b9a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - pt
5
+ - vmw
6
+ datasets:
7
+ - LIACC/Emakhuwa-Portuguese-News-MT
8
+ base_model:
9
+ - facebook/nllb-200-distilled-600M
10
+ pipeline_tag: translation
11
+ ---
12
+
13
+ # CTranslate2 NLLB-200 Translation Example
14
+
15
+ This guide demonstrates how to use NLLB-finetuned model for bilingual translation between Portuguese (`por_Latn`) and a target language (`vmw_Latn`).
16
+
17
+ ## Prerequisites
18
+
19
+ - Install required packages:
20
+ ```bash
21
+ pip install transformers torch
22
+ ```
23
+
24
+ ## Inference
25
+ ```python
26
+ from transformers import AutoModelForSeq2SeqLM, NllbTokenizer, AutoTokenizer
27
+ import torch
28
+
29
+ src_lang="por_Latn"
30
+ tgt_lang="vmw_Latn"
31
+ text="Olá mundo das língua!"
32
+
33
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
34
+
35
+ model_name="felerminoali/nllb_bilingual_pt-vmw_65k"
36
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(device)
37
+ tokenizer = NllbTokenizer.from_pretrained(model_name)
38
+
39
+ tokenizer.src_lang = src_lang
40
+ tokenizer.tgt_lang = tgt_lang
41
+
42
+ inputs = tokenizer(
43
+ text, return_tensors='pt', padding=True, truncation=True,
44
+ max_length=1024
45
+ )
46
+ model.eval() # turn off training mode
47
+ result = model.generate(
48
+ **inputs.to(model.device),
49
+ forced_bos_token_id=tokenizer.convert_tokens_to_ids(tgt_lang)
50
+ )
51
+
52
+ print(tokenizer.batch_decode(result, skip_special_tokens=True)[0])
53
+
54
+ ```