ICD-10 DX Code Identification Model
Overview
This model is designed for the identification of tokens related to ICD-10 DX codes in clinical documents. We focus on a subset of approximately 4,000+ codes, which are the most frequently used in clinical documentation. Please refer config.json file for target codes we used to train this model.
Model Details
- Type: Named Entity Recognition (NER)
- Target: ICD-10 DX Codes
- Code Subset: 4,000+ most common codes
Dataset
The dataset comprises clinical documents annotated for ICD-10 DX codes. We ensure a balanced representation of the selected codes to prevent model bias. the dataset is private one, used internally to trian the model.
Training
Due to GPU memory constraints, training is conducted in epochs with periodic evaluations to monitor performance and mitigate overfitting.
Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="imperiumhf/imp_clinical_dxcode_ner_v2")
Evaluation
Need to update metrics
Limitations and Considerations
- Overfitting risk due to repeated training on the same dataset.
- The balance between model complexity and the large number of classes.
- Regular model evaluation for performance monitoring.
Contact
Acknowledgements
All the rights over this model is reserved for Imperium software solutions pvt ltd.
- Downloads last month
- 22