ignore translating a word
I'm using M2M100_418M for multilingual translation, but I have a list of words that I'd like to keep them as they are in the translation output no matter what.
Keep in mind that the list of words is not usually commonly kept as they are in common translations. (so it's very unlikely that the model was trained to keep them as they are).
Is there a definite way to do so?
example:
original text: "I checked my < Facebook > feed and I saw a < breaking bad > series post."
translation: "... < Facebook > .... < breaking bad >... "
PS: I tried putting the desired word between brackets <> and [] but the results were very unstable.
The transformers
package has a feature called "constrained beam search" which can force certain words or phrases to appear in the generated text.
https://huggingface.co/blog/constrained-beam-search
I believe that it can be applied for your problem.
Yes! Otherwise you could implement a custom logit processor that favorises these two words (by increasing their probability of appearing)