ICD-10 DX Code Identification Model

Overview

This model is designed for the identification of tokens related to ICD-10 DX codes in clinical documents. We focus on a subset of approximately 4,000+ codes, which are the most frequently used in clinical documentation. Please refer config.json file for target codes we used to train this model.

Model Details

Type: Named Entity Recognition (NER)
Target: ICD-10 DX Codes
Code Subset: 4,000+ most common codes

Dataset

The dataset comprises clinical documents annotated for ICD-10 DX codes. We ensure a balanced representation of the selected codes to prevent model bias. the dataset is private one, used internally to trian the model.

Training

Due to GPU memory constraints, training is conducted in epochs with periodic evaluations to monitor performance and mitigate overfitting.

Use a pipeline as a high-level helper

from transformers import pipeline

pipe = pipeline("token-classification", model="imperiumhf/imp_clinical_dxcode_ner_v2")

Evaluation

Need to update metrics

Limitations and Considerations

Overfitting risk due to repeated training on the same dataset.
The balance between model complexity and the large number of classes.
Regular model evaluation for performance monitoring.

Contact

[email protected]

Acknowledgements

All the rights over this model is reserved for Imperium software solutions pvt ltd.