Can't load tokenizer using from_pretrained, please update its configuration: Tokenizer class RobertaJapaneseTokenizer does not exist or is not currently imported.

#1
by adno - opened

I am getting this error (both locally and on the model's page on huggingface.co):

Can't load tokenizer using from_pretrained, please update its configuration: Tokenizer class RobertaJapaneseTokenizer does not exist or is not currently imported.

It seems that the tokenizer class required by this model's configuration is not part of transformers code base. Is there any chance of fixing this? I'm afraid that without the tokenizer the model is not of much value.

I am currently facing the same issue.
Did you ever figure out a solution to this?

No, I didn't. There's a Roberta model from Waseda, which works fine: nlp-waseda/roberta-base-japanese-with-auto-jumanpp. (There's this version, and then a separate one, that expects the input to be pretokenized by Jumanpp.) In my experience the BERT by Tohoku University tohoku-nlp/bert-base-japanese-v2 performed better though (both in a MLM-based task using the pre-traind model and in a different fine-tuning application). Also see here for a performance comparison of different sizes of the two models and some more: https://github.com/yahoojapan/JGLUE#baseline-scores

Sign up or log in to comment