bert-restore-punctuation-ptbr / README.md

Update README.md

cc5e011 over 2 years ago

4.86 kB

	---
	language:
	- pt
	license: cc-by-4.0
	datasets:
	- wiki_lingua
	thumbnail: null
	tags:
	- named-entity-recognition
	- Transformer
	- pytorch
	- bert
	metrics:
	- f1
	- precision
	- recall
	model-index:
	- name: rpunct-ptbr
	results:
	- task:
	type: named-entity-recognition
	dataset:
	type: wiki_lingua
	name: wiki_lingua
	metrics:
	- type: f1
	value: 55.70
	name: F1 Score
	- type: precision
	value: 57.72
	name: Precision
	- type: recall
	value: 53.83
	name: Recall
	widget:
	- text: "henrique foi no lago pescar com o pedro mais tarde foram para a casa do pedro fritar os peixes"
	- text: "cinco trabalhadores da construção civil em capacetes e coletes amarelos estão ocupados no trabalho"
	- text: "na quinta feira em visita a belo horizonte pedro sobrevoa a cidade atingida pelas chuvas"
	- text: "coube ao representante de classe contar que na avaliação de língua portuguesa alguns alunos se mantiveram concentrados e outros dispersos"
	---
	# 🤗 bert-restore-punctuation-ptbr


	* 🪄 [W&B Dashboard](https://wandb.ai/dominguesm/RestorePunctuationPTBR)


	Coming soon python package for simpler use.

	This is a [bert-base-portuguese-cased](https://huggingface.co/neuralmind/bert-base-portuguese-cased) model finetuned for punctuation restoration on [WikiLingua](https://github.com/esdurmus/Wikilingua).

	This model is intended for direct use as a punctuation restoration model for the general Portuguese language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.

	Model restores the following punctuations -- [! ? . , - : ; ' ]

	The model also restores the upper-casing of words.

	-----------------------------------------------
	## 🎯 Accuracy

	\| label \| precision \| recall \| f1-score \| support\|
	\| ------------------------- \| -------------\|-------- \| ----------\|--------\|
	\| Upper - OU \| 0.89 \| 0.91 \| 0.90 \| 69376
	\| None - OO \| 0.99 \| 0.98 \| 0.98 \| 857659
	\| Full stop/period - .O \| 0.86 \| 0.93 \| 0.89 \| 60410
	\| Comma - ,O \| 0.85 \| 0.83 \| 0.84 \| 48608
	\| Upper + Comma - ,U \| 0.73 \| 0.76 \| 0.75 \| 3521
	\| Question - ?O \| 0.68 \| 0.78 \| 0.73 \| 1168
	\| Upper + period - .U \| 0.66 \| 0.72 \| 0.69 \| 1884
	\| Upper + colon - :U \| 0.59 \| 0.63 \| 0.61 \| 352
	\| Colon - :O \| 0.70 \| 0.53 \| 0.60 \| 2420
	\| Question Mark - ?U \| 0.50 \| 0.56 \| 0.53 \| 36
	\| Upper + Exclam. - !U \| 0.38 \| 0.32 \| 0.34 \| 38
	\| Exclamation Mark - !O \| 0.30 \| 0.05 \| 0.08 \| 783
	\| Semicolon - ;O \| 0.35 \| 0.04 \| 0.08 \| 1557
	\| Apostrophe - 'O \| 0.00 \| 0.00 \| 0.00 \| 3
	\| Hyphen - -O \| 0.00 \| 0.00 \| 0.00 \| 3
	\| \| \| \| \|
	\| accuracy \| \| \| 0.96 \| 1047818
	\| macro avg \| 0.57 \| 0.54 \| 0.54 \| 1047818
	\| weighted avg \| 0.96 \| 0.96 \| 0.96 \| 1047818

	-----------------------------------------------
	## 🤷 Output

	Example:

	```json
	[
	{
	"entity_group": "OU",
	"score": 0.8026431202888489,
	"word": "henrique",
	"start": 0,
	"end": 8
	},
	{
	"entity_group": "OO",
	"score": 0.9925149083137512,
	"word": "foi no lago pescar com o",
	"start": 9,
	"end": 33
	},
	{
	"entity_group": ".U",
	"score": 0.8426014184951782,
	"word": "pedro",
	"start": 34,
	"end": 39
	},
	{
	"entity_group": "OU",
	"score": 0.9519776105880737,
	"word": "mais",
	"start": 40,
	"end": 44
	},
	{
	"entity_group": ",O",
	"score": 0.8551820516586304,
	"word": "tarde",
	"start": 45,
	"end": 50
	},
	{
	"entity_group": "OO",
	"score": 0.9902807474136353,
	"word": "foram para a casa do",
	"start": 51,
	"end": 71
	},
	{
	"entity_group": "OU",
	"score": 0.9227372407913208,
	"word": "pedro",
	"start": 72,
	"end": 77
	},
	{
	"entity_group": "OO",
	"score": 0.9997054934501648,
	"word": "fritar os",
	"start": 78,
	"end": 87
	},
	{
	"entity_group": ".O",
	"score": 0.9813661575317383,
	"word": "peixes",
	"start": 88,
	"end": 94
	}
	]
	```

	This output refers to:

	```
	Henrique foi no lago pescar com o Pedro. Mais tarde, foram para a casa do Pedro fritar os peixes.
	```
	-----------------------------------------------

	## 🤙 Contact

	[Maicon Domingues]([email protected]) for questions, feedback and/or requests for similar models.