File size: 4,860 Bytes
839f981
8519bc1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
839f981
5612cee
a7cc914
b945833
 
 
 
a7cc914
 
cc5e011
a7cc914
 
 
 
 
 
 
 
5612cee
a7cc914
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5612cee
a7cc914
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5612cee
a7cc914
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
language:
- pt
license: cc-by-4.0
datasets:
- wiki_lingua
thumbnail: null
tags:
- named-entity-recognition
- Transformer
- pytorch
- bert
metrics:
- f1
- precision
- recall
model-index:
- name: rpunct-ptbr
  results:
  - task:
      type: named-entity-recognition
    dataset:
      type: wiki_lingua
      name: wiki_lingua
    metrics:
      - type: f1
        value: 55.70
        name: F1 Score
      - type: precision
        value: 57.72
        name: Precision
      - type: recall
        value: 53.83
        name: Recall
widget:
- text: "henrique foi no lago pescar com o pedro mais tarde foram para a casa do pedro fritar os peixes"
- text: "cinco trabalhadores da construção civil em capacetes e coletes amarelos estão ocupados no trabalho"
- text: "na quinta feira em visita a belo horizonte pedro sobrevoa a cidade atingida pelas chuvas"
- text: "coube ao representante de classe contar que na avaliação de língua portuguesa alguns alunos se mantiveram concentrados e outros dispersos"
---
# 🤗 bert-restore-punctuation-ptbr


* 🪄 [W&B Dashboard](https://wandb.ai/dominguesm/RestorePunctuationPTBR)


**Coming soon python package for simpler use.**

This is a [bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) model finetuned for punctuation restoration on [WikiLingua](https://github.com/esdurmus/Wikilingua). 

This model is intended for direct use as a punctuation restoration model for the general Portuguese language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.

Model restores the following punctuations -- **[! ? . , - : ; ' ]**

The model also restores the upper-casing of words.

-----------------------------------------------
## 🎯 Accuracy

|  label                    |   precision  |  recall | f1-score  | support|
| ------------------------- | -------------|-------- | ----------|--------|
| **Upper            - OU** |      0.89    |  0.91   |   0.90    |  69376
| **None             - OO** |      0.99    |  0.98   |   0.98    | 857659
| **Full stop/period - .O** |      0.86    |  0.93   |   0.89    |  60410
| **Comma            - ,O** |      0.85    |  0.83   |   0.84    |  48608
| **Upper + Comma    - ,U** |      0.73    |  0.76   |   0.75    |   3521
| **Question         - ?O** |      0.68    |  0.78   |   0.73    |   1168
| **Upper + period   - .U** |      0.66    |  0.72   |   0.69    |   1884
| **Upper + colon    - :U** |      0.59    |  0.63   |   0.61    |    352
| **Colon            - :O** |      0.70    |  0.53   |   0.60    |   2420
| **Question Mark    - ?U** |      0.50    |  0.56   |   0.53    |     36
| **Upper + Exclam.  - !U** |      0.38    |  0.32   |   0.34    |     38
| **Exclamation Mark - !O** |      0.30    |  0.05   |   0.08    |    783
| **Semicolon        - ;O** |      0.35    |  0.04   |   0.08    |   1557
| **Apostrophe       - 'O** |      0.00    |  0.00   |   0.00    |      3
| **Hyphen           - -O** |      0.00    |  0.00   |   0.00    |      3
|                           |              |         |           |
| **accuracy**              |              |         |   0.96    | 1047818
| **macro avg**             |      0.57    |  0.54   |   0.54    | 1047818
| **weighted avg**          |      0.96    |  0.96   |   0.96    | 1047818

-----------------------------------------------
## 🤷 Output

Example:

```json
[
  {
    "entity_group": "OU",
    "score": 0.8026431202888489,
    "word": "henrique",
    "start": 0,
    "end": 8
  },
  {
    "entity_group": "OO",
    "score": 0.9925149083137512,
    "word": "foi no lago pescar com o",
    "start": 9,
    "end": 33
  },
  {
    "entity_group": ".U",
    "score": 0.8426014184951782,
    "word": "pedro",
    "start": 34,
    "end": 39
  },
  {
    "entity_group": "OU",
    "score": 0.9519776105880737,
    "word": "mais",
    "start": 40,
    "end": 44
  },
  {
    "entity_group": ",O",
    "score": 0.8551820516586304,
    "word": "tarde",
    "start": 45,
    "end": 50
  },
  {
    "entity_group": "OO",
    "score": 0.9902807474136353,
    "word": "foram para a casa do",
    "start": 51,
    "end": 71
  },
  {
    "entity_group": "OU",
    "score": 0.9227372407913208,
    "word": "pedro",
    "start": 72,
    "end": 77
  },
  {
    "entity_group": "OO",
    "score": 0.9997054934501648,
    "word": "fritar os",
    "start": 78,
    "end": 87
  },
  {
    "entity_group": ".O",
    "score": 0.9813661575317383,
    "word": "peixes",
    "start": 88,
    "end": 94
  }
]
```

This output refers to:

```
Henrique foi no lago pescar com o Pedro. Mais tarde, foram para a casa do Pedro fritar os peixes.
```
-----------------------------------------------

## 🤙 Contact 

[Maicon Domingues]([email protected]) for questions, feedback and/or requests for similar models.