File size: 1,446 Bytes
19e407c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
tags:
  - flair
  - hunflair
  - token-classification
  - sequence-tagger-model
language: en
widget:
  - text: It contains a functional GCGGCGGCG Egr-1-binding site
---

## HunFlair2 model for TFBS

[HunFlair](https://github.com/flairNLP/flair/blob/master/resources/docs/HUNFLAIR2.md) (biomedical flair) for enhancer entity:

- pre-trained language model: michiyasunaga/BioLinkBERT-base
- fine-tuned on RegEl corpus for `Tfbs` entity type

Predicts 1 tag:

| **tag** | **meaning**                              |
| ------- | ---------------------------------------- |
| Tfbs    | DNA region bound by transcription factor |

______________________________________________________________________

## Info

### Demo: How to use in Flair

Requires:

- **[Flair](https://github.com/flairNLP/flair/)>=0.14.0** (`pip install flair` or `pip install git+https://github.com/flairNLP/flair.git`)

```python
from flair.data import Sentence
from flair.nn import Classifier
from flair.tokenization import SciSpacyTokenizer

text = "We found that Egr-1 specifically binds to the PTEN 5' untranslated region, which contains a functional GCGGCGGCG Egr-1-binding site."
sentence = Sentence(text, use_tokenizer=SciSpacyTokenizer())

tagger = Classifier.load("regel-corpus/hunflair2-regel-tfbs")
tagger.predict(sentence)

print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('ner'):
    print(entity)
```