--- license: apache-2.0 language: - en tags: - Token Classification co2_eq_emissions: 0.0279399890043426 widget: - text: ""MSH|^~&|SendingAPP|MYTEST|||20230621090000||ORU^R01|1|P|2.5.1||||||UNICODE PID|1||13579246^^^TEST||Taylor^Michael||19830520|M|||987 Pine St^^Anytown^NY^23456||555-456-7890 PV1|1||bc^^004 OBR|1||13579246|BCD^LEFT Breast Cancer Diagnosis^99MRC||20230621090000|||Taylor^Sarah||20230620090000|||N OBX|1|ST|FINDINGS^Findings^99MRC||Lab report shows asymmetric density in the right breast.|F|||R OBX|2|ST|IMPRESSION^Impression^99MRC||BIRADS category: 4 - Probably left side as issues.|F|||R OBX|3|ST|RECOMMENDATION^Recommendation^99MRC||Follow-up specialit visit in six months.|F|||R"" example_title: "example 1" - text: "MSH|^~&|SendingAPP|MYTEST|||20230621090000||ORU^R01|1|P|2.5.1||||||UNICODE PID|1||13579246^^^TEST||Taylor^Michael||19830520|M|||987 Pine St^^Anytown^NY^23456||555-456-7890 PV1|1||bc^^004 OBR|1||13579246|BCD^LEFT Breast Cancer Diagnosis^99MRC||20230621090000|||Taylor^Sarah||20230620090000|||N OBX|1|ST|FINDINGS^Findings^99MRC||Lab report shows asymmetric density in the right breast.|F|||R OBX|2|ST|IMPRESSION^Impression^99MRC||BIRADS category: 4 - Probably left side as issues.|F|||R OBX|3|ST|RECOMMENDATION^Recommendation^99MRC||Follow-up specialit visit in six months.|F|||R" ## About the Model An English Named Entity Recognition model, trained on Maccrobat to recognize the bio-medical entities (107 entities) from a given text corpus (case reports etc.). This model was built on top of distilbert-base-uncased - Dataset: Maccrobat https://figshare.com/articles/dataset/MACCROBAT2018/9764942 - Carbon emission: 0.0279399890043426 Kg - Training time: 30.16527 minutes - GPU used : 1 x GeForce RTX 3060 Laptop GPU Checkout the tutorial video for explanation of this model and corresponding python library: https://youtu.be/xpiDPdBpS18 ## Usage The easiest way is to load the inference api from huggingface and second method is through the pipeline object offered by transformers library. ```python from transformers import pipeline from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("d4data/biomedical-ner-all") model = AutoModelForTokenClassification.from_pretrained("d4data/biomedical-ner-all") pipe = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="simple") # pass device=0 if using gpu pipe("""The patient reported no recurrence of palpitations at follow-up 6 months after the ablation.""") ``` ## Author