noeminaepli commited on
Commit
8cdb23a
1 Parent(s): 0cd9bc2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -7
README.md CHANGED
@@ -1,19 +1,35 @@
1
  ---
 
 
2
  ---
3
 
4
- # swiss\_german\_pos\_model
5
 
6
- The *swiss_german_pos_model* is a part-of-speech tagging model for Swiss German. The model is trained on [Universal POS tags (upos)](https://universaldependencies.org/u/pos/).
 
7
 
8
  ### Training procedure and data sets
9
 
10
- - Base model: German LM: [dbmdz/bert-base-german-cased](https://huggingface.co/dbmdz/bert-base-german-cased)
11
- - Continued LM training with [swisscrawl data](https://icosys.ch/swisscrawl)
12
- - Task fine-tuning on the [UD\_German-HDT](https://github.com/UniversalDependencies/UD_German-HDT/tree/master) data set with [character-level noise](https://aclanthology.org/2022.findings-acl.321/)
13
- - Task fine-tuning on the Swiss German [NOAH-Corpus](https://noe-eva.github.io/NOAH-Corpus/) (train + dev split)
14
 
15
- Accuracy on NOAH test split: 0.9587
 
16
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
 
19
  ### Training hyperparameters
 
1
  ---
2
+ language: gsw
3
+ license: cc
4
  ---
5
 
6
+ # Swiss German STTS Part-of-Speech Tagging Model
7
 
8
+ The **swiss_german_pos_model** is a part-of-speech tagging model for Swiss German. The model is trained on [STTS POS Tags](https://universaldependencies.org/tagset-conversion/de-stts-uposf.html).
9
+ Note that there is also a model trained on [Universal POS tags (upos)](https://universaldependencies.org/u/pos/): [swiss_german_pos_model](https://huggingface.co/noeminaepli/swiss_german_pos_model).
10
 
11
  ### Training procedure and data sets
12
 
13
+ 1) Base model: German LM: [dbmdz/bert-base-german-cased](https://huggingface.co/dbmdz/bert-base-german-cased)
14
+ 2) Continued LM training with [swisscrawl data](https://icosys.ch/swisscrawl)
15
+ 3) Task fine-tuning on the [UD\_German-HDT](https://github.com/UniversalDependencies/UD_German-HDT/tree/master) data set with [character-level noise](https://aclanthology.org/2022.findings-acl.321/)
16
+ 4) Task fine-tuning on the Swiss German [NOAH-Corpus](https://noe-eva.github.io/NOAH-Corpus/) (train + dev split)
17
 
18
+ - Accuracy on Swiss German NOAH test split: 0.9432
19
+ - Accuracy on German UD_German-HDT test set after GSW fine-tuning: 0.9826 (vs 0.9828 at step 3 before GSW fine-tuning)
20
 
21
+ ### Usage
22
+
23
+ ```python
24
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
25
+
26
+ model = AutoModelForTokenClassification.from_pretrained("noeminaepli/swiss_german_stts_pos_model")
27
+ tokenizer = AutoTokenizer.from_pretrained("noeminaepli/swiss_german_stts_pos_model")
28
+
29
+ pos_tagger = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
30
+ tokens = pos_tagger("Worum söu mes ned chönne?")
31
+
32
+ ```
33
 
34
 
35
  ### Training hyperparameters