---
language:
- pt
license: cc-by-4.0
datasets:
- wiki_lingua
thumbnail: null
tags:
- named-entity-recognition
- Transformer
- pytorch
- bert
metrics:
- f1
- precision
- recall
model-index:
- name: rpunct-ptbr
  results:
  - task:
      type: named-entity-recognition
    dataset:
      type: wiki_lingua
      name: wiki_lingua
    metrics:
      - type: f1
        value: 55.70
        name: F1 Score
      - type: precision
        value: 57.72
        name: Precision
      - type: recall
        value: 53.83
        name: Recall
widget:
- text: "henrique foi no lago pescar com o pedro mais tarde foram para a casa do pedro fritar os peixes"
- text: "cinco trabalhadores da construção civil em capacetes e coletes amarelos estão ocupados no trabalho"
- text: "na quinta feira em visita a belo horizonte pedro sobrevoa a cidade atingida pelas chuvas"
- text: "coube ao representante de classe contar que na avaliação de língua portuguesa alguns alunos se mantiveram concentrados e outros dispersos"
---
# 🤗 bert-restore-punctuation-ptbr


* 🪄 [W&B Dashboard](https://wandb.ai/dominguesm/RestorePunctuationPTBR)


**Coming soon python package for simpler use.**

This is a [bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) model finetuned for punctuation restoration on [WikiLingua](https://github.com/esdurmus/Wikilingua). 

This model is intended for direct use as a punctuation restoration model for the general Portuguese language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.

Model restores the following punctuations -- **[! ? . , - : ; ' ]**

The model also restores the upper-casing of words.

-----------------------------------------------
## 🎯 Accuracy

|  label                    |   precision  |  recall | f1-score  | support|
| ------------------------- | -------------|-------- | ----------|--------|
| **Upper            - OU** |      0.89    |  0.91   |   0.90    |  69376
| **None             - OO** |      0.99    |  0.98   |   0.98    | 857659
| **Full stop/period - .O** |      0.86    |  0.93   |   0.89    |  60410
| **Comma            - ,O** |      0.85    |  0.83   |   0.84    |  48608
| **Upper + Comma    - ,U** |      0.73    |  0.76   |   0.75    |   3521
| **Question         - ?O** |      0.68    |  0.78   |   0.73    |   1168
| **Upper + period   - .U** |      0.66    |  0.72   |   0.69    |   1884
| **Upper + colon    - :U** |      0.59    |  0.63   |   0.61    |    352
| **Colon            - :O** |      0.70    |  0.53   |   0.60    |   2420
| **Question Mark    - ?U** |      0.50    |  0.56   |   0.53    |     36
| **Upper + Exclam.  - !U** |      0.38    |  0.32   |   0.34    |     38
| **Exclamation Mark - !O** |      0.30    |  0.05   |   0.08    |    783
| **Semicolon        - ;O** |      0.35    |  0.04   |   0.08    |   1557
| **Apostrophe       - 'O** |      0.00    |  0.00   |   0.00    |      3
| **Hyphen           - -O** |      0.00    |  0.00   |   0.00    |      3
|                           |              |         |           |
| **accuracy**              |              |         |   0.96    | 1047818
| **macro avg**             |      0.57    |  0.54   |   0.54    | 1047818
| **weighted avg**          |      0.96    |  0.96   |   0.96    | 1047818

-----------------------------------------------
## 🤷 Output

Example:

```json
[
  {
    "entity_group": "OU",
    "score": 0.8026431202888489,
    "word": "henrique",
    "start": 0,
    "end": 8
  },
  {
    "entity_group": "OO",
    "score": 0.9925149083137512,
    "word": "foi no lago pescar com o",
    "start": 9,
    "end": 33
  },
  {
    "entity_group": ".U",
    "score": 0.8426014184951782,
    "word": "pedro",
    "start": 34,
    "end": 39
  },
  {
    "entity_group": "OU",
    "score": 0.9519776105880737,
    "word": "mais",
    "start": 40,
    "end": 44
  },
  {
    "entity_group": ",O",
    "score": 0.8551820516586304,
    "word": "tarde",
    "start": 45,
    "end": 50
  },
  {
    "entity_group": "OO",
    "score": 0.9902807474136353,
    "word": "foram para a casa do",
    "start": 51,
    "end": 71
  },
  {
    "entity_group": "OU",
    "score": 0.9227372407913208,
    "word": "pedro",
    "start": 72,
    "end": 77
  },
  {
    "entity_group": "OO",
    "score": 0.9997054934501648,
    "word": "fritar os",
    "start": 78,
    "end": 87
  },
  {
    "entity_group": ".O",
    "score": 0.9813661575317383,
    "word": "peixes",
    "start": 88,
    "end": 94
  }
]
```

This output refers to:

```
Henrique foi no lago pescar com o Pedro. Mais tarde, foram para a casa do Pedro fritar os peixes.
```
-----------------------------------------------

## 🤙 Contact 

[Maicon Domingues](dominguesm@outlook.com) for questions, feedback and/or requests for similar models.