Strange Output

#3
by sajmahmo - opened

If the input of model is "Hi", the output will be the strange text below in Spanish (es_XX):
['En la misma sesión, la Comisión aprobó el proyecto de resolución A/C.1/55/L.29 sin someterlo a votación (véase párr.']
The output for "Hello" is also something strange.

I executed the code below:
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

article_en = "Hi"
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-one-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-one-to-many-mmt", src_lang="en_XX")

model_inputs = tokenizer(article_en, return_tensors="pt", max_length=500)

generated_tokens = model.generate(
**model_inputs,
forced_bos_token_id=tokenizer.lang_code_to_id["es_XX"],
max_new_tokens=500
)
tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

I am sorry, but the translations of this model are too bad.

hey, did you get any solution for this? I'm having the same problem

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment