Token Classification
Transformers
ONNX
Safetensors
modernbert
PII
MikeDoes commited on
Commit
87889fd
·
verified ·
1 Parent(s): 608e5e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -3
README.md CHANGED
@@ -1,3 +1,85 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - ai4privacy/open-pii-masking-500k-ai4privacy
5
+ language:
6
+ - fr
7
+ - en
8
+ - de
9
+ - te
10
+ - hi
11
+ - it
12
+ - es
13
+ - nl
14
+ base_model:
15
+ - answerdotai/ModernBERT-base
16
+ library_name: transformers
17
+ tags:
18
+ - PII
19
+ ---
20
+
21
+ ## Evaluation Metrics
22
+
23
+ The table below summarizes the detailed evaluation results per PII label:
24
+
25
+ | **Label** | **TP** | **FP** | **FN** | **Accuracy** | **Precision** | **Recall** | **F1 Score** |
26
+ |--------------------|:------:|:------:|:------|:------------:|:-------------:|:----------:|:------------:|
27
+ | SURNAME | 3722 | 0 | 28 | 99.25% | 100.0% | 99.25% | 99.63% |
28
+ | O (Non-PII) | 0 | 400 | 0 | 99.30% | n/a | n/a | n/a |
29
+ | TIME | 1936 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
30
+ | DRIVERLICENSENUM | 505 | 0 | 2 | 99.61% | 100.0% | 99.61% | 99.80% |
31
+ | PASSPORTNUM | 564 | 0 | 2 | 99.65% | 100.0% | 99.65% | 99.82% |
32
+ | GIVENNAME | 7548 | 0 | 172 | 97.77% | 100.0% | 97.77% | 98.87% |
33
+ | TELEPHONENUM | 3641 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
34
+ | BUILDINGNUM | 407 | 0 | 19 | 95.54% | 100.0% | 95.54% | 97.72% |
35
+ | AGE | 168 | 0 | 1 | 99.41% | 100.0% | 99.41% | 99.70% |
36
+ | DATE | 2335 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
37
+ | CITY | 1672 | 0 | 130 | 92.79% | 100.0% | 92.79% | 96.26% |
38
+ | TITLE | 349 | 0 | 35 | 90.89% | 100.0% | 90.89% | 95.23% |
39
+ | IDCARDNUM | 1998 | 0 | 22 | 98.91% | 100.0% | 98.91% | 99.45% |
40
+ | GENDER | 121 | 0 | 0 | 100.0% | 100.0% | 100.0% | 100.0% |
41
+ | CREDITCARDNUMBER | 557 | 0 | 1 | 99.82% | 100.0% | 99.82% | 99.91% |
42
+ | SEX | 78 | 0 | 1 | 98.73% | 100.0% | 98.73% | 99.36% |
43
+ | STREET | 1368 | 0 | 19 | 98.63% | 100.0% | 98.63% | 99.31% |
44
+ | TAXNUM | 345 | 0 | 12 | 96.64% | 100.0% | 96.64% | 98.29% |
45
+ | EMAIL | 2606 | 0 | 2 | 99.92% | 100.0% | 99.92% | 99.96% |
46
+ | SOCIALNUM | 411 | 0 | 11 | 97.39% | 100.0% | 97.39% | 98.68% |
47
+ | ZIPCODE | 406 | 0 | 20 | 95.31% | 100.0% | 95.31% | 97.60% |
48
+
49
+ ### Overall Evaluation
50
+ - **Accuracy:** 99.01%
51
+ - **Precision:** 98.72%
52
+ - **Recall:** 98.47%
53
+ - **F1 Score:** 98.59%
54
+
55
+ - **Total True Positives (TP):** 30,737
56
+ - **Total False Positives (FP):** 400
57
+ - **Total False Negatives (FN):** 477
58
+
59
+ ### Macro-Averaged Metrics
60
+ - **Accuracy:** 98.35%
61
+ - **Precision:** 95.24%
62
+ - **Recall:** 93.35%
63
+ - **F1 Score:** 94.29%
64
+
65
+ ---
66
+
67
+ ## Model Behavior & Limitations
68
+
69
+ - **Evaluation Focus:**
70
+ The metrics shown above reflect performance on the test split of the [open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) dataset. Real-world performance may vary and requires additional measures. Feel free to contact [email protected] for assistance.
71
+
72
+ ---
73
+
74
+ ## Disclaimer
75
+
76
+ This model card details the evaluation metrics and fine-tuning parameters for the multilingual anonymiser. **Please note:**
77
+ - The model is provided **as-is** under the MIT License.
78
+ - It is intended solely for redaction purposes and does not perform full PII classification.
79
+ - Users should carefully test and evaluate its performance on their own data before deploying in production environments.
80
+
81
+ ---
82
+
83
+ *Ai4Privacy – Committed to protecting personal data in the age of AI.*
84
+
85
+ ---