noeminaepli
/

swiss_german_stts_pos_model

Token Classification

Model card Files Files and versions Community

noeminaepli commited on Jan 14, 2023

Commit

8cdb23a

·

1 Parent(s): 0cd9bc2

Update README.md

Files changed (1) hide show

README.md +23 -7

README.md CHANGED Viewed

@@ -1,19 +1,35 @@
 ---
 ---
-# swiss\_german\_pos\_model
-The *swiss_german_pos_model* is a part-of-speech tagging model for Swiss German. The model is trained on [Universal POS tags (upos)](https://universaldependencies.org/u/pos/).
 ### Training procedure and data sets
-- Base model: German LM: [dbmdz/bert-base-german-cased](https://huggingface.co/dbmdz/bert-base-german-cased)
-- Continued LM training with [swisscrawl data](https://icosys.ch/swisscrawl)
-- Task fine-tuning on the [UD\_German-HDT](https://github.com/UniversalDependencies/UD_German-HDT/tree/master) data set with [character-level noise](https://aclanthology.org/2022.findings-acl.321/)
-- Task fine-tuning on the Swiss German [NOAH-Corpus](https://noe-eva.github.io/NOAH-Corpus/) (train + dev split)
-Accuracy on NOAH test split: 0.9587
 ### Training hyperparameters

 ---
+language: gsw
+license: cc
 ---
+# Swiss German STTS Part-of-Speech Tagging Model
+The **swiss_german_pos_model** is a part-of-speech tagging model for Swiss German. The model is trained on [STTS POS Tags](https://universaldependencies.org/tagset-conversion/de-stts-uposf.html).
+Note that there is also a model trained on [Universal POS tags (upos)](https://universaldependencies.org/u/pos/): [swiss_german_pos_model](https://huggingface.co/noeminaepli/swiss_german_pos_model).
 ### Training procedure and data sets
+1) Base model: German LM: [dbmdz/bert-base-german-cased](https://huggingface.co/dbmdz/bert-base-german-cased)
+2) Continued LM training with [swisscrawl data](https://icosys.ch/swisscrawl)
+3) Task fine-tuning on the [UD\_German-HDT](https://github.com/UniversalDependencies/UD_German-HDT/tree/master) data set with [character-level noise](https://aclanthology.org/2022.findings-acl.321/)
+4) Task fine-tuning on the Swiss German [NOAH-Corpus](https://noe-eva.github.io/NOAH-Corpus/) (train + dev split)
+- Accuracy on Swiss German NOAH test split: 0.9432
+- Accuracy on German UD_German-HDT test set after GSW fine-tuning: 0.9826 (vs 0.9828 at step 3 before GSW fine-tuning)
+### Usage
+```python
+from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
+model = AutoModelForTokenClassification.from_pretrained("noeminaepli/swiss_german_stts_pos_model")
+tokenizer = AutoTokenizer.from_pretrained("noeminaepli/swiss_german_stts_pos_model")
+pos_tagger = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
+tokens = pos_tagger("Worum söu mes ned chönne?")
+```
 ### Training hyperparameters