--- license: mit datasets: - ai4privacy/open-pii-masking-500k-ai4privacy language: - fr - en - de - te - hi - it - es - nl base_model: - answerdotai/ModernBERT-base library_name: transformers tags: - PII --- ## Evaluation Metrics The table below summarizes the detailed evaluation results per PII label: | **Label** | **TP** | **FP** | **FN** | **Accuracy** | **Precision** | **Recall** | **F1 Score** | |--------------------|:------:|:------:|:------|:------------:|:-------------:|:----------:|:------------:| | SURNAME | 3722 | 0 | 28 | 99.25% | 100.0% | 99.25% | 99.63% | | O (Non-PII) | 0 | 400 | 0 | 99.30% | n/a | n/a | n/a | | TIME | 1936 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% | | DRIVERLICENSENUM | 505 | 0 | 2 | 99.61% | 100.0% | 99.61% | 99.80% | | PASSPORTNUM | 564 | 0 | 2 | 99.65% | 100.0% | 99.65% | 99.82% | | GIVENNAME | 7548 | 0 | 172 | 97.77% | 100.0% | 97.77% | 98.87% | | TELEPHONENUM | 3641 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% | | BUILDINGNUM | 407 | 0 | 19 | 95.54% | 100.0% | 95.54% | 97.72% | | AGE | 168 | 0 | 1 | 99.41% | 100.0% | 99.41% | 99.70% | | DATE | 2335 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% | | CITY | 1672 | 0 | 130 | 92.79% | 100.0% | 92.79% | 96.26% | | TITLE | 349 | 0 | 35 | 90.89% | 100.0% | 90.89% | 95.23% | | IDCARDNUM | 1998 | 0 | 22 | 98.91% | 100.0% | 98.91% | 99.45% | | GENDER | 121 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% | | CREDITCARDNUMBER | 557 | 0 | 1 | 99.82% | 100.0% | 99.82% | 99.91% | | SEX | 78 | 0 | 1 | 98.73% | 100.0% | 98.73% | 99.36% | | STREET | 1368 | 0 | 19 | 98.63% | 100.0% | 98.63% | 99.31% | | TAXNUM | 345 | 0 | 12 | 96.64% | 100.0% | 96.64% | 98.29% | | EMAIL | 2606 | 0 | 2 | 99.92% | 100.0% | 99.92% | 99.96% | | SOCIALNUM | 411 | 0 | 11 | 97.39% | 100.0% | 97.39% | 98.68% | | ZIPCODE | 406 | 0 | 20 | 95.31% | 100.0% | 95.31% | 97.60% | ### Overall Evaluation - **Accuracy:** 99.01% - **Precision:** 98.72% - **Recall:** 98.47% - **F1 Score:** 98.59% - **Total True Positives (TP):** 30,737 - **Total False Positives (FP):** 400 - **Total False Negatives (FN):** 477 ### Macro-Averaged Metrics - **Accuracy:** 98.35% - **Precision:** 95.24% - **Recall:** 93.35% - **F1 Score:** 94.29% --- ## Model Behavior & Limitations - **Evaluation Focus:** The metrics shown above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. Real-world performance may vary and requires additional measures. Feel free to contact support@ai4privacy.com for assistance. --- ## Disclaimer This model card details the evaluation metrics and fine-tuning parameters for the multilingual anonymiser. **Please note:** - The model is provided **as-is** under the MIT License. - It is intended solely for redaction purposes and does not perform full PII classification. - Users should carefully test and evaluate its performance on their own data before deploying in production environments. --- *Ai4Privacy – Committed to protecting personal data in the age of AI.* ---