alanakbik commited on
Commit
d527a1e
1 Parent(s): 9785a9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -11
README.md CHANGED
@@ -9,32 +9,131 @@ datasets:
9
  inference: false
10
  ---
11
 
12
- ## Flair NER model `en-ner-conll03-v0.4.pt`
13
 
14
- Imported from https://nlp.informatik.hu-berlin.de/resources/models/ner/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ### Demo: How to use in Flair
17
 
 
 
18
  ```python
19
  from flair.data import Sentence
20
  from flair.models import SequenceTagger
21
 
22
- sentence = Sentence(
23
- "My name is Julien, I currently live in Paris, I work at Hugging Face, Inc."
24
- )
25
-
26
- tagger = SequenceTagger.load("julien-c/flair-ner")
27
 
 
 
28
 
29
  # predict NER tags
30
  tagger.predict(sentence)
31
 
32
- # print sentence with predicted tags
33
- print(sentence.to_tagged_string())
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
 
 
 
 
 
 
36
  ```
37
 
38
- yields the following output:
39
 
40
- > `My name is Julien <S-PER> , I currently live in Paris <S-LOC> , I work at Hugging <B-LOC> Face <E-LOC> .`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  inference: false
10
  ---
11
 
12
+ ## English NER in Flair (default model)
13
 
14
+ This is the standard 4-class NER model for German that ships with [Flair](https://github.com/flairNLP/flair/).
15
+
16
+ F1-Score: **87,94** (CoNLL-03 German revised)
17
+
18
+ Predicts 4 tags:
19
+
20
+ | **tag** | **meaning** |
21
+ |---------------------------------|-----------|
22
+ | PER | person name |
23
+ | LOC | location name |
24
+ | ORG | organization name |
25
+ | MISC | other name |
26
+
27
+ Based on [Flair embeddings](https://www.aclweb.org/anthology/C18-1139/) and LSTM-CRF.
28
+
29
+ ---
30
 
31
  ### Demo: How to use in Flair
32
 
33
+ Requires: **[Flair](https://github.com/flairNLP/flair/)** (`pip install flair`)
34
+
35
  ```python
36
  from flair.data import Sentence
37
  from flair.models import SequenceTagger
38
 
39
+ # load tagger
40
+ tagger = SequenceTagger.load("flair/ner-german")
 
 
 
41
 
42
+ # make example sentence
43
+ sentence = Sentence("George Washington ging nach Washington")
44
 
45
  # predict NER tags
46
  tagger.predict(sentence)
47
 
48
+ # print sentence
49
+ print(sentence)
50
+
51
+ # print predicted NER spans
52
+ print('The following NER tags are found:')
53
+ # iterate over entities and print
54
+ for entity in sentence.get_spans('ner'):
55
+ print(entity)
56
+
57
+ ```
58
+
59
+ This yields the following output:
60
+ ```
61
+ Span [1,2]: "George Washington" [− Labels: PER (0.9968)]
62
+ Span [5]: "Washington" [− Labels: LOC (0.9994)]
63
+ ```
64
+
65
+ So, the entities "*George Washington*" (labeled as a **person**) and "*Washington*" (labeled as a **location**) are found in the sentence "*George Washington went to Washington*".
66
+
67
+
68
+ ---
69
+
70
+ ### Training: Script to train this model
71
+
72
+ The following Flair script was used to train this model:
73
 
74
+ ```python
75
+ from flair.data import Corpus
76
+ from flair.datasets import CONLL_03_GERMAN
77
+ from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
78
+
79
+ # 1. get the corpus
80
+ corpus: Corpus = CONLL_03_GERMAN()
81
+
82
+ # 2. what tag do we want to predict?
83
+ tag_type = 'ner'
84
+
85
+ # 3. make the tag dictionary from the corpus
86
+ tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
87
+
88
+ # 4. initialize each embedding we use
89
+ embedding_types = [
90
+
91
+ # GloVe embeddings
92
+ WordEmbeddings('de'),
93
+
94
+ # contextual string embeddings, forward
95
+ FlairEmbeddings('de-forward'),
96
+
97
+ # contextual string embeddings, backward
98
+ FlairEmbeddings('de-backward'),
99
+ ]
100
+
101
+ # embedding stack consists of Flair and GloVe embeddings
102
+ embeddings = StackedEmbeddings(embeddings=embedding_types)
103
+
104
+ # 5. initialize sequence tagger
105
+ from flair.models import SequenceTagger
106
+
107
+ tagger = SequenceTagger(hidden_size=256,
108
+ embeddings=embeddings,
109
+ tag_dictionary=tag_dictionary,
110
+ tag_type=tag_type)
111
+
112
+ # 6. initialize trainer
113
+ from flair.trainers import ModelTrainer
114
 
115
+ trainer = ModelTrainer(tagger, corpus)
116
+
117
+ # 7. run training
118
+ trainer.train('resources/taggers/ner-german',
119
+ train_with_dev=True,
120
+ max_epochs=150)
121
  ```
122
 
 
123
 
124
+
125
+ ---
126
+
127
+ ### Cite
128
+
129
+ Please cite the following paper when using this model.
130
+
131
+ ```
132
+ @inproceedings{akbik2018coling,
133
+ title={Contextual String Embeddings for Sequence Labeling},
134
+ author={Akbik, Alan and Blythe, Duncan and Vollgraf, Roland},
135
+ booktitle = {{COLING} 2018, 27th International Conference on Computational Linguistics},
136
+ pages = {1638--1649},
137
+ year = {2018}
138
+ }
139
+ ```