rubert_ner_SDDCS / README.md
Mykes's picture
Update README.md
fe24757 verified
metadata
base_model:
  - cointegrated/rubert-tiny2
datasets:
  - Mykes/patient_queries_ner_SDDCS
language:
  - ru
library_name: transformers
tags:
  - biology
  - medical

image/jpeg

rubert_ner_SDDCS

SDDCS - abbreviation for ner-entities SYMPTOMS, DISEASES, DRUGS, CITIES, SUBWAY STATIONS (additionall it is able to predict GENDER and AGE entities) This is a fine-tuned Named Entity Recognition (NER) model based on the cointegrated/rubert-tiny2 model with only 29.4M params, designed to detect russian medical entities like diseases, drugs, symptoms, and more.

rubert_ner_SDDCS

Модель med_ner_SDDCS для извлечения именнованных сущностей из запросов пациентов. Аббревиатура SDDCS указывает на список сущностей (S - симптомы, D - заболевания, D - препараты, C - город, S - станция метро. Также, модель выделяет GENDER - указание на пол и AGE - указание на возраст). Модель основана на компактной rubert-tiny2 модели с 29.4 миллиона параметров, что оптимально для запуска на сервере с небольшими требованиями к железу.

Model Details

Entities Recognized:

  • GENDER (e.g., женщина, мужчина) 👩👨
  • DISEASE (e.g., паническое расстройство, грипп, ...) 🤒
  • SYMPTOM (e.g., тревога, одышка, ...) 🩺
  • SPECIALITY (e.g., невролог, кардиолог, ...) 👩‍⚕️
  • CITY (e.g., Тула, Москва, Иркутск, ...) 🏙️
  • SUBWAY (e.g., Шоссе Энтузиастов, Проспект Мира, ...) 🚇
  • DRUG (e.g., кардиомагнил, ципралекс) 💊
  • AGE (e.g., ребенок, пожилой) 🧒🏼👴

Model Performance

The fine-tuned model has achieved the following performance metrics:

              precision    recall  f1-score   support

         AGE       1.00      1.00      1.00       583
        CITY       1.00      1.00      1.00      5244
     DISEASE       0.99      1.00      1.00      6569
        DRUG       1.00      1.00      1.00      8220
      GENDER       1.00      1.00      1.00       664
  SPECIALITY       1.00      0.98      0.99      4207
      SUBWAY       1.00      1.00      1.00      1084
     SYMPTOM       1.00      1.00      1.00      8979

   micro avg       1.00      1.00      1.00     35550
   macro avg       1.00      1.00      1.00     35550
weighted avg       1.00      1.00      1.00     35550

When to use

You can use this model with the huggingface transformers 🤗 to perform Named Entity Recognition (NER) tasks in the russian medical domain, mainly for patient queries.

Here's how to load and use the model:

Load the tokenizer and model with transformers

from transformers import pipeline

pipe = pipeline(task="ner", model='Mykes/rubert_ner_SDDCS', tokenizer='Mykes/rubert_ner_SDDCS', aggregation_strategy="max")
# I made the misspelled words on purpose
query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство. Подскажи психиатра в районе метро Октбрьской."
pipe(query.lower())

Result:

[{'entity_group': 'AGE',
  'score': 0.99993,
  'word': 'ребенка',
  'start': 2,
  'end': 9},
 {'entity_group': 'SYMPTOM',
  'score': 0.9885457,
  'word': 'треога',
  'start': 10,
  'end': 16},
 {'entity_group': 'SYMPTOM',
  'score': 0.9934536,
  'word': 'норушения сна',
  'start': 19,
  'end': 32},
 {'entity_group': 'SYMPTOM',
  'score': 0.9999765,
  'word': 'потеря сознания',
  'start': 34,
  'end': 49},
 {'entity_group': 'DISEASE',
  'score': 0.999972,
  'word': 'паническое расстройство',
  'start': 66,
  'end': 89},
 {'entity_group': 'SPECIALITY',
  'score': 0.85958296,
  'word': 'психиатра',
  'start': 100,
  'end': 109},
 {'entity_group': 'SUBWAY',
  'score': 0.9955049,
  'word': 'октбрьской',
  'start': 125,
  'end': 135}]

How to render

import spacy
from spacy import displacy

def convert_to_displacy_format(text, ner_results):
    entities = []
    for result in ner_results:
        # Convert the Hugging Face output into the format displacy expects
        entities.append({
            "start": result['start'],
            "end": result['end'],
            "label": result['entity_group']
        })
    return {
        "text": text,
        "ents": entities,
        "title": None
    }
query = "У ребенка треога и норушения сна, потеря сознания, раньше ставили паническое расстройство, принимал атаракс. Подскажи хорошего психиатра в районе метро Октбрьской."
ner_results = pipe(query.lower())
displacy_data = convert_to_displacy_format(query, ner_results)
colors = {
    "SPECIALITY": "linear-gradient(90deg, #aa9cfc, #fc9ce7)",
    "CITY": "linear-gradient(90deg, #feca57, #ff9f43)",
    "DRUG": "linear-gradient(90deg, #55efc4, #81ecec)",
    "DISEASE": "linear-gradient(90deg, #fab1a0, #ff7675)",
    "SUBWAY": "linear-gradient(90deg, #00add0, #0039a6)",
    "AGE": "linear-gradient(90deg, #f39c12, #e67e22)",
    "SYMPTOM": "linear-gradient(90deg, #e74c3c, #c0392b)"
}
options = {"ents": ["SPECIALITY", "CITY", "DRUG", "DISEASE", "SYMPTOM", "AGE", "SUBWAY"], "colors": colors}
html = displacy.render(displacy_data, style="ent", manual=True, options=options, jupyter=False)
with open("ner_visualization_with_colors.html", "w", encoding="utf-8") as f:
    f.write(html)
from IPython.display import display, HTML
display(HTML(html))