metadata
language:
- pt
license: cc-by-4.0
datasets:
- wiki_lingua
thumbnail: null
tags:
- named-entity-recognition
- Transformer
- pytorch
- bert
metrics:
- f1
- precision
- recall
model-index:
- name: rpunct-ptbr
results:
- task:
type: named-entity-recognition
dataset:
type: wiki_lingua
name: wiki_lingua
metrics:
- type: f1
value: 55.7
name: F1 Score
- type: precision
value: 57.72
name: Precision
- type: recall
value: 53.83
name: Recall
widget:
- text: >-
henrique foi no lago pescar com o pedro mais tarde foram para a casa do
pedro fritar os peixes
- text: >-
cinco trabalhadores da construção civil em capacetes e coletes amarelos
estão ocupados no trabalho
- text: >-
na quinta feira em visita a belo horizonte pedro sobrevoa a cidade
atingida pelas chuvas
- text: >-
coube ao representante de classe contar que na avaliação de língua
portuguesa alguns alunos se mantiveram concentrados e outros dispersos
🤗 bert-restore-punctuation-ptbr
Coming soon python package for simpler use.
This is a bert-base-portuguese-cased model finetuned for punctuation restoration on WikiLingua.
This model is intended for direct use as a punctuation restoration model for the general Portuguese language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
Model restores the following punctuations -- [! ? . , - : ; ' ]
The model also restores the upper-casing of words.
🎯 Accuracy
label | precision | recall | f1-score | support |
---|---|---|---|---|
Upper - OU | 0.89 | 0.91 | 0.90 | 69376 |
None - OO | 0.99 | 0.98 | 0.98 | 857659 |
Full stop/period - .O | 0.86 | 0.93 | 0.89 | 60410 |
Comma - ,O | 0.85 | 0.83 | 0.84 | 48608 |
Upper + Comma - ,U | 0.73 | 0.76 | 0.75 | 3521 |
Question - ?O | 0.68 | 0.78 | 0.73 | 1168 |
Upper + period - .U | 0.66 | 0.72 | 0.69 | 1884 |
Upper + colon - :U | 0.59 | 0.63 | 0.61 | 352 |
Colon - :O | 0.70 | 0.53 | 0.60 | 2420 |
Question Mark - ?U | 0.50 | 0.56 | 0.53 | 36 |
Upper + Exclam. - !U | 0.38 | 0.32 | 0.34 | 38 |
Exclamation Mark - !O | 0.30 | 0.05 | 0.08 | 783 |
Semicolon - ;O | 0.35 | 0.04 | 0.08 | 1557 |
Apostrophe - 'O | 0.00 | 0.00 | 0.00 | 3 |
Hyphen - -O | 0.00 | 0.00 | 0.00 | 3 |
accuracy | 0.96 | 1047818 | ||
macro avg | 0.57 | 0.54 | 0.54 | 1047818 |
weighted avg | 0.96 | 0.96 | 0.96 | 1047818 |
🤷 Output
Example:
[
{
"entity_group": "OU",
"score": 0.8026431202888489,
"word": "henrique",
"start": 0,
"end": 8
},
{
"entity_group": "OO",
"score": 0.9925149083137512,
"word": "foi no lago pescar com o",
"start": 9,
"end": 33
},
{
"entity_group": ".U",
"score": 0.8426014184951782,
"word": "pedro",
"start": 34,
"end": 39
},
{
"entity_group": "OU",
"score": 0.9519776105880737,
"word": "mais",
"start": 40,
"end": 44
},
{
"entity_group": ",O",
"score": 0.8551820516586304,
"word": "tarde",
"start": 45,
"end": 50
},
{
"entity_group": "OO",
"score": 0.9902807474136353,
"word": "foram para a casa do",
"start": 51,
"end": 71
},
{
"entity_group": "OU",
"score": 0.9227372407913208,
"word": "pedro",
"start": 72,
"end": 77
},
{
"entity_group": "OO",
"score": 0.9997054934501648,
"word": "fritar os",
"start": 78,
"end": 87
},
{
"entity_group": ".O",
"score": 0.9813661575317383,
"word": "peixes",
"start": 88,
"end": 94
}
]
This output refers to:
Henrique foi no lago pescar com o Pedro. Mais tarde, foram para a casa do Pedro fritar os peixes.
🤙 Contact
Maicon Domingues for questions, feedback and/or requests for similar models.