dominguesm commited on
Commit
a7cc914
1 Parent(s): 8519bc1

Readme update

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md CHANGED
@@ -38,3 +38,123 @@ widget:
38
  - text: "na quinta feira em visita a belo horizonte pedro sobrevoa a cidade atingida pelas chuvas"
39
  - text: "coube ao representante de classe contar que na avaliação de língua portuguesa alguns alunos se mantiveram concentrados e outros dispersos"
40
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  - text: "na quinta feira em visita a belo horizonte pedro sobrevoa a cidade atingida pelas chuvas"
39
  - text: "coube ao representante de classe contar que na avaliação de língua portuguesa alguns alunos se mantiveram concentrados e outros dispersos"
40
  ---
41
+ # bert-restore-punctuation-ptbr
42
+
43
+ **Coming soon python package for simpler use.**
44
+
45
+ This is a bert-base-portuguese-cased model finetuned for punctuation restoration on [WikiLingua](https://github.com/esdurmus/Wikilingua).
46
+
47
+ This model is intended for direct use as a punctuation restoration model for the general Portuguese language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
48
+
49
+ Model restores the following punctuations -- **[! ? . , - : ; ' ]**
50
+
51
+ The model also restores the upper-casing of words.
52
+
53
+ -----------------------------------------------
54
+ ## Accuracy
55
+
56
+ | label | precision | recall | f1-score | support|
57
+ | ------------------------- | -------------|-------- | ----------|--------|
58
+ | **Upper - OU** | 0.89 | 0.91 | 0.90 | 69376
59
+ | **None - OO** | 0.99 | 0.98 | 0.98 | 857659
60
+ | **Full stop/period - .O** | 0.86 | 0.93 | 0.89 | 60410
61
+ | **Comma - ,O** | 0.85 | 0.83 | 0.84 | 48608
62
+ | **Upper + Comma - ,U** | 0.73 | 0.76 | 0.75 | 3521
63
+ | **Question - ?O** | 0.68 | 0.78 | 0.73 | 1168
64
+ | **Upper + period - .U** | 0.66 | 0.72 | 0.69 | 1884
65
+ | **Upper + colon - :U** | 0.59 | 0.63 | 0.61 | 352
66
+ | **Colon - :O** | 0.70 | 0.53 | 0.60 | 2420
67
+ | **Question Mark - ?U** | 0.50 | 0.56 | 0.53 | 36
68
+ | **Upper + Exclam. - !U** | 0.38 | 0.32 | 0.34 | 38
69
+ | **Exclamation Mark - !O** | 0.30 | 0.05 | 0.08 | 783
70
+ | **Semicolon - ;O** | 0.35 | 0.04 | 0.08 | 1557
71
+ | **Apostrophe - 'O** | 0.00 | 0.00 | 0.00 | 3
72
+ | **Hyphen - -O** | 0.00 | 0.00 | 0.00 | 3
73
+ | | | | |
74
+ | **accuracy** | | | 0.96 | 1047818
75
+ | **macro avg** | 0.57 | 0.54 | 0.54 | 1047818
76
+ | **weighted avg** | 0.96 | 0.96 | 0.96 | 1047818
77
+
78
+ -----------------------------------------------
79
+ ## Output
80
+
81
+ Example:
82
+
83
+ ```json
84
+ [
85
+ {
86
+ "entity_group": "OU",
87
+ "score": 0.8026431202888489,
88
+ "word": "henrique",
89
+ "start": 0,
90
+ "end": 8
91
+ },
92
+ {
93
+ "entity_group": "OO",
94
+ "score": 0.9925149083137512,
95
+ "word": "foi no lago pescar com o",
96
+ "start": 9,
97
+ "end": 33
98
+ },
99
+ {
100
+ "entity_group": ".U",
101
+ "score": 0.8426014184951782,
102
+ "word": "pedro",
103
+ "start": 34,
104
+ "end": 39
105
+ },
106
+ {
107
+ "entity_group": "OU",
108
+ "score": 0.9519776105880737,
109
+ "word": "mais",
110
+ "start": 40,
111
+ "end": 44
112
+ },
113
+ {
114
+ "entity_group": ",O",
115
+ "score": 0.8551820516586304,
116
+ "word": "tarde",
117
+ "start": 45,
118
+ "end": 50
119
+ },
120
+ {
121
+ "entity_group": "OO",
122
+ "score": 0.9902807474136353,
123
+ "word": "foram para a casa do",
124
+ "start": 51,
125
+ "end": 71
126
+ },
127
+ {
128
+ "entity_group": "OU",
129
+ "score": 0.9227372407913208,
130
+ "word": "pedro",
131
+ "start": 72,
132
+ "end": 77
133
+ },
134
+ {
135
+ "entity_group": "OO",
136
+ "score": 0.9997054934501648,
137
+ "word": "fritar os",
138
+ "start": 78,
139
+ "end": 87
140
+ },
141
+ {
142
+ "entity_group": ".O",
143
+ "score": 0.9813661575317383,
144
+ "word": "peixes",
145
+ "start": 88,
146
+ "end": 94
147
+ }
148
+ ]
149
+ ```
150
+
151
+ This output refers to:
152
+
153
+ ```
154
+ Henrique foi no lago pescar com o Pedro. Mais tarde, foram para a casa do Pedro fritar os peixes.
155
+ ```
156
+ -----------------------------------------------
157
+
158
+ ## Contact
159
+
160
+ [Maicon Domingues]([email protected]) for questions, feedback and/or requests for similar models.