ICD-10 DX Code Identification Model

Overview

This model is designed for the identification of tokens related to ICD-10 DX codes in clinical documents. We focus on a subset of approximately 4,000+ codes, which are the most frequently used in clinical documentation. Please refer config.json file for target codes we used to train this model.

Model Details

  • Type: Named Entity Recognition (NER)
  • Target: ICD-10 DX Codes
  • Code Subset: 4,000+ most common codes

Dataset

The dataset comprises clinical documents annotated for ICD-10 DX codes. We ensure a balanced representation of the selected codes to prevent model bias. the dataset is private one, used internally to trian the model.

Training

Due to GPU memory constraints, training is conducted in epochs with periodic evaluations to monitor performance and mitigate overfitting.

Use a pipeline as a high-level helper

from transformers import pipeline

pipe = pipeline("token-classification", model="imperiumhf/imp_clinical_dxcode_ner_v2")

Evaluation

Need to update metrics

Limitations and Considerations

  • Overfitting risk due to repeated training on the same dataset.
  • The balance between model complexity and the large number of classes.
  • Regular model evaluation for performance monitoring.

Contact

[email protected]

Acknowledgements

All the rights over this model is reserved for Imperium software solutions pvt ltd.

Downloads last month
22
Safetensors
Model size
155M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support