Update tokenizer similar to https://huggingface.co/FremyCompany/roberta-large-nl-oscar23/discussions/3

by dennisc1 - opened

I read the discussion at https://huggingface.co/FremyCompany/roberta-large-nl-oscar23/discussions/3 and I think this model is also affected by this behaviour. So maybe the same update should be pushed?

Good point !

That should not affect the results in any way though, the fix is purely cosmetic (at least until someone fine-tunes).

Ideally, this model should be retrained with the fixed base model, but even then it should not make a measurable difference for Dutch content. I'll have to retrain anyway when I release the final OS STS dataset, and I'll depreciate this model when I release the next one.

Sign up or log in to comment