File size: 4,112 Bytes
839f981
8519bc1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
839f981
5612cee
a7cc914
b945833
 
b02fa88
b945833
 
cc5e011
a7cc914
 
 
 
 
 
 
b02fa88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a7cc914
5612cee
a7cc914
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5612cee
a7cc914
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
language:
- pt
license: cc-by-4.0
datasets:
- wiki_lingua
thumbnail: null
tags:
- named-entity-recognition
- Transformer
- pytorch
- bert
metrics:
- f1
- precision
- recall
model-index:
- name: rpunct-ptbr
  results:
  - task:
      type: named-entity-recognition
    dataset:
      type: wiki_lingua
      name: wiki_lingua
    metrics:
      - type: f1
        value: 55.70
        name: F1 Score
      - type: precision
        value: 57.72
        name: Precision
      - type: recall
        value: 53.83
        name: Recall
widget:
- text: "henrique foi no lago pescar com o pedro mais tarde foram para a casa do pedro fritar os peixes"
- text: "cinco trabalhadores da construção civil em capacetes e coletes amarelos estão ocupados no trabalho"
- text: "na quinta feira em visita a belo horizonte pedro sobrevoa a cidade atingida pelas chuvas"
- text: "coube ao representante de classe contar que na avaliação de língua portuguesa alguns alunos se mantiveram concentrados e outros dispersos"
---
# 🤗 bert-restore-punctuation-ptbr


* 🪄 [W&B Dashboard](https://wandb.ai/dominguesm/RestorePunctuationPTBR)
* ⛭ [GitHub](https://github.com/DominguesM/respunct)


This is a [bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) model finetuned for punctuation restoration on [WikiLingua](https://github.com/esdurmus/Wikilingua). 

This model is intended for direct use as a punctuation restoration model for the general Portuguese language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.

Model restores the following punctuations -- **[! ? . , - : ; ' ]**

The model also restores the upper-casing of words.

-----------------------------------------------

## 🤷 Usage

🇧🇷 easy-to-use package to restore punctuation of portuguese texts.

**Below is a quick way to use the template.**

1. First, install the package.

```
pip install respunct
```

2. Sample python code.

``` python
from respunct import RestorePuncts

model = RestorePuncts()

model.restore_puncts("""
henrique foi no lago pescar com o pedro mais tarde foram para a casa do pedro fritar os peixes""")
# output:
# Henrique foi no lago pescar com o Pedro. Mais tarde, foram para a casa do Pedro fritar os peixes.

```

-----------------------------------------------
## 🎯 Accuracy

|  label                    |   precision  |  recall | f1-score  | support|
| ------------------------- | -------------|-------- | ----------|--------|
| **Upper            - OU** |      0.89    |  0.91   |   0.90    |  69376
| **None             - OO** |      0.99    |  0.98   |   0.98    | 857659
| **Full stop/period - .O** |      0.86    |  0.93   |   0.89    |  60410
| **Comma            - ,O** |      0.85    |  0.83   |   0.84    |  48608
| **Upper + Comma    - ,U** |      0.73    |  0.76   |   0.75    |   3521
| **Question         - ?O** |      0.68    |  0.78   |   0.73    |   1168
| **Upper + period   - .U** |      0.66    |  0.72   |   0.69    |   1884
| **Upper + colon    - :U** |      0.59    |  0.63   |   0.61    |    352
| **Colon            - :O** |      0.70    |  0.53   |   0.60    |   2420
| **Question Mark    - ?U** |      0.50    |  0.56   |   0.53    |     36
| **Upper + Exclam.  - !U** |      0.38    |  0.32   |   0.34    |     38
| **Exclamation Mark - !O** |      0.30    |  0.05   |   0.08    |    783
| **Semicolon        - ;O** |      0.35    |  0.04   |   0.08    |   1557
| **Apostrophe       - 'O** |      0.00    |  0.00   |   0.00    |      3
| **Hyphen           - -O** |      0.00    |  0.00   |   0.00    |      3
|                           |              |         |           |
| **accuracy**              |              |         |   0.96    | 1047818
| **macro avg**             |      0.57    |  0.54   |   0.54    | 1047818
| **weighted avg**          |      0.96    |  0.96   |   0.96    | 1047818

-----------------------------------------------


## 🤙 Contact 

[Maicon Domingues]([email protected]) for questions, feedback and/or requests for similar models.