Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,85 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- ai4privacy/open-pii-masking-500k-ai4privacy
|
5 |
+
language:
|
6 |
+
- fr
|
7 |
+
- en
|
8 |
+
- de
|
9 |
+
- te
|
10 |
+
- hi
|
11 |
+
- it
|
12 |
+
- es
|
13 |
+
- nl
|
14 |
+
base_model:
|
15 |
+
- answerdotai/ModernBERT-base
|
16 |
+
library_name: transformers
|
17 |
+
tags:
|
18 |
+
- PII
|
19 |
+
---
|
20 |
+
|
21 |
+
## Evaluation Metrics
|
22 |
+
|
23 |
+
The table below summarizes the detailed evaluation results per PII label:
|
24 |
+
|
25 |
+
| **Label** | **TP** | **FP** | **FN** | **Accuracy** | **Precision** | **Recall** | **F1 Score** |
|
26 |
+
|--------------------|:------:|:------:|:------|:------------:|:-------------:|:----------:|:------------:|
|
27 |
+
| SURNAME | 3722 | 0 | 28 | 99.25% | 100.0% | 99.25% | 99.63% |
|
28 |
+
| O (Non-PII) | 0 | 400 | 0 | 99.30% | n/a | n/a | n/a |
|
29 |
+
| TIME | 1936 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
|
30 |
+
| DRIVERLICENSENUM | 505 | 0 | 2 | 99.61% | 100.0% | 99.61% | 99.80% |
|
31 |
+
| PASSPORTNUM | 564 | 0 | 2 | 99.65% | 100.0% | 99.65% | 99.82% |
|
32 |
+
| GIVENNAME | 7548 | 0 | 172 | 97.77% | 100.0% | 97.77% | 98.87% |
|
33 |
+
| TELEPHONENUM | 3641 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
|
34 |
+
| BUILDINGNUM | 407 | 0 | 19 | 95.54% | 100.0% | 95.54% | 97.72% |
|
35 |
+
| AGE | 168 | 0 | 1 | 99.41% | 100.0% | 99.41% | 99.70% |
|
36 |
+
| DATE | 2335 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
|
37 |
+
| CITY | 1672 | 0 | 130 | 92.79% | 100.0% | 92.79% | 96.26% |
|
38 |
+
| TITLE | 349 | 0 | 35 | 90.89% | 100.0% | 90.89% | 95.23% |
|
39 |
+
| IDCARDNUM | 1998 | 0 | 22 | 98.91% | 100.0% | 98.91% | 99.45% |
|
40 |
+
| GENDER | 121 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
|
41 |
+
| CREDITCARDNUMBER | 557 | 0 | 1 | 99.82% | 100.0% | 99.82% | 99.91% |
|
42 |
+
| SEX | 78 | 0 | 1 | 98.73% | 100.0% | 98.73% | 99.36% |
|
43 |
+
| STREET | 1368 | 0 | 19 | 98.63% | 100.0% | 98.63% | 99.31% |
|
44 |
+
| TAXNUM | 345 | 0 | 12 | 96.64% | 100.0% | 96.64% | 98.29% |
|
45 |
+
| EMAIL | 2606 | 0 | 2 | 99.92% | 100.0% | 99.92% | 99.96% |
|
46 |
+
| SOCIALNUM | 411 | 0 | 11 | 97.39% | 100.0% | 97.39% | 98.68% |
|
47 |
+
| ZIPCODE | 406 | 0 | 20 | 95.31% | 100.0% | 95.31% | 97.60% |
|
48 |
+
|
49 |
+
### Overall Evaluation
|
50 |
+
- **Accuracy:** 99.01%
|
51 |
+
- **Precision:** 98.72%
|
52 |
+
- **Recall:** 98.47%
|
53 |
+
- **F1 Score:** 98.59%
|
54 |
+
|
55 |
+
- **Total True Positives (TP):** 30,737
|
56 |
+
- **Total False Positives (FP):** 400
|
57 |
+
- **Total False Negatives (FN):** 477
|
58 |
+
|
59 |
+
### Macro-Averaged Metrics
|
60 |
+
- **Accuracy:** 98.35%
|
61 |
+
- **Precision:** 95.24%
|
62 |
+
- **Recall:** 93.35%
|
63 |
+
- **F1 Score:** 94.29%
|
64 |
+
|
65 |
+
---
|
66 |
+
|
67 |
+
## Model Behavior & Limitations
|
68 |
+
|
69 |
+
- **Evaluation Focus:**
|
70 |
+
The metrics shown above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. Real-world performance may vary and requires additional measures. Feel free to contact [email protected] for assistance.
|
71 |
+
|
72 |
+
---
|
73 |
+
|
74 |
+
## Disclaimer
|
75 |
+
|
76 |
+
This model card details the evaluation metrics and fine-tuning parameters for the multilingual anonymiser. **Please note:**
|
77 |
+
- The model is provided **as-is** under the MIT License.
|
78 |
+
- It is intended solely for redaction purposes and does not perform full PII classification.
|
79 |
+
- Users should carefully test and evaluate its performance on their own data before deploying in production environments.
|
80 |
+
|
81 |
+
---
|
82 |
+
|
83 |
+
*Ai4Privacy – Committed to protecting personal data in the age of AI.*
|
84 |
+
|
85 |
+
---
|