README.md · ai4privacy/llama-ai4privacy-multilingual-anonymiser-openpii at main

metadata

license: mit
datasets:
  - ai4privacy/open-pii-masking-500k-ai4privacy
language:
  - fr
  - en
  - de
  - te
  - hi
  - it
  - es
  - nl
base_model:
  - answerdotai/ModernBERT-base
library_name: transformers
tags:
  - PII

Evaluation Metrics

The table below summarizes the detailed evaluation results per PII label:

Label	TP	FP	FN	Accuracy	Precision	Recall	F1 Score
SURNAME	3722	0	28	99.25%	100.0%	99.25%	99.63%
O (Non-PII)	0	400	0	99.30%	n/a	n/a	n/a
TIME	1936	0	0	100.0%	100.0%	100.0%	100.0%
DRIVERLICENSENUM	505	0	2	99.61%	100.0%	99.61%	99.80%
PASSPORTNUM	564	0	2	99.65%	100.0%	99.65%	99.82%
GIVENNAME	7548	0	172	97.77%	100.0%	97.77%	98.87%
TELEPHONENUM	3641	0	0	100.0%	100.0%	100.0%	100.0%
BUILDINGNUM	407	0	19	95.54%	100.0%	95.54%	97.72%
AGE	168	0	1	99.41%	100.0%	99.41%	99.70%
DATE	2335	0	0	100.0%	100.0%	100.0%	100.0%
CITY	1672	0	130	92.79%	100.0%	92.79%	96.26%
TITLE	349	0	35	90.89%	100.0%	90.89%	95.23%
IDCARDNUM	1998	0	22	98.91%	100.0%	98.91%	99.45%
GENDER	121	0	0	100.0%	100.0%	100.0%	100.0%
CREDITCARDNUMBER	557	0	1	99.82%	100.0%	99.82%	99.91%
SEX	78	0	1	98.73%	100.0%	98.73%	99.36%
STREET	1368	0	19	98.63%	100.0%	98.63%	99.31%
TAXNUM	345	0	12	96.64%	100.0%	96.64%	98.29%
EMAIL	2606	0	2	99.92%	100.0%	99.92%	99.96%
SOCIALNUM	411	0	11	97.39%	100.0%	97.39%	98.68%
ZIPCODE	406	0	20	95.31%	100.0%	95.31%	97.60%

Overall Evaluation

Accuracy: 99.01%
Precision: 98.72%
Recall: 98.47%
F1 Score: 98.59%
Total True Positives (TP): 30,737
Total False Positives (FP): 400
Total False Negatives (FN): 477

Macro-Averaged Metrics

Accuracy: 98.35%
Precision: 95.24%
Recall: 93.35%
F1 Score: 94.29%

Model Behavior & Limitations

Evaluation Focus:
The metrics shown above reflect performance on the test split of the open-pii-masking-500k-ai4privacy dataset. Real-world performance may vary and requires additional measures. Feel free to contact [email protected] for assistance.

Disclaimer

This model card details the evaluation metrics and fine-tuning parameters for the multilingual anonymiser. Please note:

The model is provided as-is under the MIT License.
It is intended solely for redaction purposes and does not perform full PII classification.
Users should carefully test and evaluate its performance on their own data before deploying in production environments.

Ai4Privacy – Committed to protecting personal data in the age of AI.