dominguesm
commited on
Commit
•
a7cc914
1
Parent(s):
8519bc1
Readme update
Browse files
README.md
CHANGED
@@ -38,3 +38,123 @@ widget:
|
|
38 |
- text: "na quinta feira em visita a belo horizonte pedro sobrevoa a cidade atingida pelas chuvas"
|
39 |
- text: "coube ao representante de classe contar que na avaliação de língua portuguesa alguns alunos se mantiveram concentrados e outros dispersos"
|
40 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
- text: "na quinta feira em visita a belo horizonte pedro sobrevoa a cidade atingida pelas chuvas"
|
39 |
- text: "coube ao representante de classe contar que na avaliação de língua portuguesa alguns alunos se mantiveram concentrados e outros dispersos"
|
40 |
---
|
41 |
+
# bert-restore-punctuation-ptbr
|
42 |
+
|
43 |
+
**Coming soon python package for simpler use.**
|
44 |
+
|
45 |
+
This is a bert-base-portuguese-cased model finetuned for punctuation restoration on [WikiLingua](https://github.com/esdurmus/Wikilingua).
|
46 |
+
|
47 |
+
This model is intended for direct use as a punctuation restoration model for the general Portuguese language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
|
48 |
+
|
49 |
+
Model restores the following punctuations -- **[! ? . , - : ; ' ]**
|
50 |
+
|
51 |
+
The model also restores the upper-casing of words.
|
52 |
+
|
53 |
+
-----------------------------------------------
|
54 |
+
## Accuracy
|
55 |
+
|
56 |
+
| label | precision | recall | f1-score | support|
|
57 |
+
| ------------------------- | -------------|-------- | ----------|--------|
|
58 |
+
| **Upper - OU** | 0.89 | 0.91 | 0.90 | 69376
|
59 |
+
| **None - OO** | 0.99 | 0.98 | 0.98 | 857659
|
60 |
+
| **Full stop/period - .O** | 0.86 | 0.93 | 0.89 | 60410
|
61 |
+
| **Comma - ,O** | 0.85 | 0.83 | 0.84 | 48608
|
62 |
+
| **Upper + Comma - ,U** | 0.73 | 0.76 | 0.75 | 3521
|
63 |
+
| **Question - ?O** | 0.68 | 0.78 | 0.73 | 1168
|
64 |
+
| **Upper + period - .U** | 0.66 | 0.72 | 0.69 | 1884
|
65 |
+
| **Upper + colon - :U** | 0.59 | 0.63 | 0.61 | 352
|
66 |
+
| **Colon - :O** | 0.70 | 0.53 | 0.60 | 2420
|
67 |
+
| **Question Mark - ?U** | 0.50 | 0.56 | 0.53 | 36
|
68 |
+
| **Upper + Exclam. - !U** | 0.38 | 0.32 | 0.34 | 38
|
69 |
+
| **Exclamation Mark - !O** | 0.30 | 0.05 | 0.08 | 783
|
70 |
+
| **Semicolon - ;O** | 0.35 | 0.04 | 0.08 | 1557
|
71 |
+
| **Apostrophe - 'O** | 0.00 | 0.00 | 0.00 | 3
|
72 |
+
| **Hyphen - -O** | 0.00 | 0.00 | 0.00 | 3
|
73 |
+
| | | | |
|
74 |
+
| **accuracy** | | | 0.96 | 1047818
|
75 |
+
| **macro avg** | 0.57 | 0.54 | 0.54 | 1047818
|
76 |
+
| **weighted avg** | 0.96 | 0.96 | 0.96 | 1047818
|
77 |
+
|
78 |
+
-----------------------------------------------
|
79 |
+
## Output
|
80 |
+
|
81 |
+
Example:
|
82 |
+
|
83 |
+
```json
|
84 |
+
[
|
85 |
+
{
|
86 |
+
"entity_group": "OU",
|
87 |
+
"score": 0.8026431202888489,
|
88 |
+
"word": "henrique",
|
89 |
+
"start": 0,
|
90 |
+
"end": 8
|
91 |
+
},
|
92 |
+
{
|
93 |
+
"entity_group": "OO",
|
94 |
+
"score": 0.9925149083137512,
|
95 |
+
"word": "foi no lago pescar com o",
|
96 |
+
"start": 9,
|
97 |
+
"end": 33
|
98 |
+
},
|
99 |
+
{
|
100 |
+
"entity_group": ".U",
|
101 |
+
"score": 0.8426014184951782,
|
102 |
+
"word": "pedro",
|
103 |
+
"start": 34,
|
104 |
+
"end": 39
|
105 |
+
},
|
106 |
+
{
|
107 |
+
"entity_group": "OU",
|
108 |
+
"score": 0.9519776105880737,
|
109 |
+
"word": "mais",
|
110 |
+
"start": 40,
|
111 |
+
"end": 44
|
112 |
+
},
|
113 |
+
{
|
114 |
+
"entity_group": ",O",
|
115 |
+
"score": 0.8551820516586304,
|
116 |
+
"word": "tarde",
|
117 |
+
"start": 45,
|
118 |
+
"end": 50
|
119 |
+
},
|
120 |
+
{
|
121 |
+
"entity_group": "OO",
|
122 |
+
"score": 0.9902807474136353,
|
123 |
+
"word": "foram para a casa do",
|
124 |
+
"start": 51,
|
125 |
+
"end": 71
|
126 |
+
},
|
127 |
+
{
|
128 |
+
"entity_group": "OU",
|
129 |
+
"score": 0.9227372407913208,
|
130 |
+
"word": "pedro",
|
131 |
+
"start": 72,
|
132 |
+
"end": 77
|
133 |
+
},
|
134 |
+
{
|
135 |
+
"entity_group": "OO",
|
136 |
+
"score": 0.9997054934501648,
|
137 |
+
"word": "fritar os",
|
138 |
+
"start": 78,
|
139 |
+
"end": 87
|
140 |
+
},
|
141 |
+
{
|
142 |
+
"entity_group": ".O",
|
143 |
+
"score": 0.9813661575317383,
|
144 |
+
"word": "peixes",
|
145 |
+
"start": 88,
|
146 |
+
"end": 94
|
147 |
+
}
|
148 |
+
]
|
149 |
+
```
|
150 |
+
|
151 |
+
This output refers to:
|
152 |
+
|
153 |
+
```
|
154 |
+
Henrique foi no lago pescar com o Pedro. Mais tarde, foram para a casa do Pedro fritar os peixes.
|
155 |
+
```
|
156 |
+
-----------------------------------------------
|
157 |
+
|
158 |
+
## Contact
|
159 |
+
|
160 |
+
[Maicon Domingues]([email protected]) for questions, feedback and/or requests for similar models.
|