Update tokenizer similar to https://huggingface.co/FremyCompany/roberta-large-nl-oscar23/discussions/3
#2
by
dennisc1
- opened
I read the discussion at https://huggingface.co/FremyCompany/roberta-large-nl-oscar23/discussions/3 and I think this model is also affected by this behaviour. So maybe the same update should be pushed?
Good point !
That should not affect the results in any way though, the fix is purely cosmetic (at least until someone fine-tunes).
Ideally, this model should be retrained with the fixed base model, but even then it should not make a measurable difference for Dutch content. I'll have to retrain anyway when I release the final OS STS dataset, and I'll depreciate this model when I release the next one.