Update tokenizer similar to https://huggingface.co/FremyCompany/roberta-large-nl-oscar23/discussions/3

by denniscraandijk - opened Dec 7, 2023

Dec 7, 2023

I read the discussion at https://huggingface.co/FremyCompany/roberta-large-nl-oscar23/discussions/3 and I think this model is also affected by this behaviour. So maybe the same update should be pushed?

FremyCompany

Owner Dec 7, 2023

•

edited Dec 7, 2023

Good point !

That should not affect the results in any way though, the fix is purely cosmetic (at least until someone fine-tunes).

Ideally, this model should be retrained with the fixed base model, but even then it should not make a measurable difference for Dutch content. I'll have to retrain anyway when I release the final OS STS dataset, and I'll depreciate this model when I release the next one.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment