医疗领域中文命名实体识别

项目地址:https://github.com/iioSnail/chinese_medical_ner

使用方法:

from transformers import AutoModelForTokenClassification, BertTokenizerFast

tokenizer = BertTokenizerFast.from_pretrained('iioSnail/bert-base-chinese-medical-ner')
model = AutoModelForTokenClassification.from_pretrained("iioSnail/bert-base-chinese-medical-ner")

sentences = ["瘦脸针、水光针和玻尿酸详解!", "半月板钙化的病因有哪些?"]
inputs = tokenizer(sentences, return_tensors="pt", padding=True, add_special_tokens=False)
outputs = model(**inputs)
outputs = outputs.logits.argmax(-1) * inputs['attention_mask']

print(outputs)

输出结果:

tensor([[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 4, 4],
        [1, 2, 2, 2, 3, 4, 4, 4, 4, 4, 4, 4, 0, 0]])

其中 1=B, 2=I, 3=E, 4=O1, 3表示一个二字医疗实体,1,2,3表示一个3字医疗实体, 1,2,2,3表示一个4字医疗实体,依次类推。

可以使用项目中的MedicalNerModel.format_outputs(sentences, outputs)来将输出进行转换。

效果如下:

[
  [
    {'start': 0, 'end': 3, 'word': '瘦脸针'},
    {'start': 4, 'end': 7, 'word': '水光针'},
    {'start': 8, 'end': 11, 'word': '玻尿酸'}、
  ],
  [
    {'start': 0, 'end': 5, 'word': '半月板钙化'}
  ]
]

更多信息请参考项目:https://github.com/iioSnail/chinese_medical_ner

Downloads last month
108
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.