Multilingual?
Your great dataset is multilingual (especiall FR/DE/IT), however this finetune is labelled only english.
Has the model really been finetuned only on the english subset of the dataset? or might it be also good for multilingual use-case (primarily DE/FR?)
Thank you anyway for this great contribution!
Thanks a lot for your command
@Werner
. Very interesting to hear. We haven't done any checks yet for multi-lingualism on the enlgish anonymiser but I wouldn't be surprised. On the other hand we did train the multi-lingual models as well now:
https://huggingface.co/ai4privacy/llama-ai4privacy-multilingual-anonymiser-openpii
https://huggingface.co/ai4privacy/llama-ai4privacy-multilingual-categorical-anonymiser-openpii
Feel free to try them out. The scores are of course lower than english (especially because I forgot to exclude the experimental languages, something to fix in the near future)
Also this model is now available in our space to run locally in your browser:
https://huggingface.co/spaces/ai4privacy/general-english-anonymiser-openpii-500k
This English one seems to work really really well however for many cases. After working for 2 weeks with it, it's still too sensitive in the field of science e.g. equations thinks they are sensitive but so far that has been the only complaint. Let me know what you think and thank you for your support