TCMNER

About Author.
Our Products

Model description

TCMNER is a fine-tuned BERT model that is ready to use for Named Entity Recognition of Traditional Chinese Medicine and achieves state-of-the-art performance for the NER task. It has been trained to recognize six types of entities: prescription (方剂), herb (本草), source (来源), disease (病名), symptom (症状) and syndrome(证型).

Specifically, this model is a TCMRoBERTa model, a fine-tuned model of RoBERTa for Traditional Chinese medicine, that was fine-tuned on the Chinese version of the Haiwei AI Lab's Named Entity Recognition dataset.

Currently, TCMRoBERTa is just a closed-source model for my own company and will be open-source in the future.

How to use

You can use this model with Transformers pipeline for NER.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Monor/TCMNER")
model = AutoModelForTokenClassification.from_pretrained("Monor/TCMNER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "化滞汤,出处:《证治汇补》卷八。。组成:青皮20g,陈皮20g,厚朴20g,枳实20g,黄芩20g,黄连20g,当归20g,芍药20g,木香5g,槟榔8g,滑石3g,甘草4g。。主治:下痢因于食积气滞者。"

ner_results = nlp(example)
print(ner_results)

Training data

This model was fine-tuned on MY DATASET.

Abbreviation Description
O Outside of a named entity
B-方剂 Beginning of a prescription entity right after another prescription entity
I-方剂 Prescription entity
B-本草 Beginning of a herb entity right after another herb entity
I-本草 Herb entity
B-来源 Beginning of a source of prescription right after another source of prescription
I-来源 Source entity
B-病名 Beginning of a disease's name right after another disease's name
I-病名 Disease's name
B-症状 Beginning of a symptom right after another symptom
I-症状 Symptom
B-证型 Beginning of a syndrome right after another syndrome
I-证型 Syndrome

Eval results

alt text

Notices

  1. The model is commercially available for free.
  2. I am not going to write a paper about this model, if you use any details in your paper, please mention it, thanks.

Bonus

All of our TCM domain models will be open-sourced soon, including:

  1. A series of pre-trained models
  2. Named entity recognition for TCM
  3. Text localization in ancient images
  4. OCR for ancient images

And so on

Downloads last month
22
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including Monor/hwtcmner