ai4privacy/llama-ai4privacy-english-anonymiser-openpii

Mar 12

Your great dataset is multilingual (especiall FR/DE/IT), however this finetune is labelled only english.
Has the model really been finetuned only on the english subset of the dataset? or might it be also good for multilingual use-case (primarily DE/FR?)

Thank you anyway for this great contribution!

MikeDoes

Ai4Privacy org Mar 26

Thanks a lot for your command @Werner . Very interesting to hear. We haven't done any checks yet for multi-lingualism on the enlgish anonymiser but I wouldn't be surprised. On the other hand we did train the multi-lingual models as well now:
https://huggingface.co/ai4privacy/llama-ai4privacy-multilingual-anonymiser-openpii
https://huggingface.co/ai4privacy/llama-ai4privacy-multilingual-categorical-anonymiser-openpii

Feel free to try them out. The scores are of course lower than english (especially because I forgot to exclude the experimental languages, something to fix in the near future)

Also this model is now available in our space to run locally in your browser:
https://huggingface.co/spaces/ai4privacy/general-english-anonymiser-openpii-500k

This English one seems to work really really well however for many cases. After working for 2 weeks with it, it's still too sensitive in the field of science e.g. equations thinks they are sensitive but so far that has been the only complaint. Let me know what you think and thank you for your support

MikeDoes changed discussion status to closed Mar 26

ai4privacy
/

llama-ai4privacy-english-anonymiser-openpii

Multilingual?