Update README.md

d7a8295 verified 2 months ago

4.38 kB

	---
	library_name: transformers
	tags:
	- biobert
	- medical-nlp
	- icd-9
	- classification
	- healthcare
	license: apache-2.0
	language:
	- en
	base_model:
	- dmis-lab/biobert-v1.1
	pipeline_tag: text-classification
	---

	# Model Card for BioBERT Fine-tuned on MIMIC-3 for ICD-9 Code Classification

	## Model Details

	### Model Description

	This is a BioBERT model fine-tuned on the MIMIC-3 (Medical Information Mart for Intensive Care) corpus specifically for ICD-9 code classification. The model is designed to predict medical diagnostic codes based on Electronic Health Record (EHR) and symptom text inputs.

	- Developed by: [Researcher/Institution Name - to be added]
	- Model type: Transformer-based medical language model (BioBERT)
	- Language(s): English (Medical Domain)
	- License: [License to be specified]
	- Finetuned from model: BioBERT base model

	### Model Sources

	- Repository: [GitHub/Model Repository Link - to be added]
	- Paper: [Research Paper Link - to be added]

	## Uses

	### Direct Use

	The primary use of this model is to automatically classify medical conditions by predicting relevant ICD-9 diagnostic codes from clinical text, such as electronic health records, medical notes, or symptom descriptions.

	### Downstream Use

	This model can be integrated into:
	- Clinical decision support systems
	- Medical coding automation
	- Electronic health record (EHR) analysis tools
	- Healthcare informatics research

	### Out-of-Scope Use

	- The model should not be used for direct medical diagnosis without professional medical oversight
	- It is not intended to replace clinical judgment
	- Performance may vary with text outside the medical domain or significantly different from the training corpus

	## Bias, Risks, and Limitations

	- The model's performance is limited to the medical conditions and coding patterns in the MIMIC-3 dataset
	- Potential biases from the original training data may be present
	- Accuracy can be affected by variations in medical terminology, writing styles, and complex medical cases

	### Recommendations

	- Validate model predictions with medical professionals
	- Use as a supportive tool, not a replacement for expert medical assessment
	- Regularly evaluate performance on new datasets
	- Be aware of potential demographic or contextual biases in the predictions

	## How to Get Started with the Model

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer
	import torch

	# Load the model and tokenizer
	model = AutoModelForSequenceClassification.from_pretrained('model_path')
	tokenizer = AutoTokenizer.from_pretrained('model_path')

	# Example prediction function (similar to the provided get_predictions function)
	def predict_icd9_codes(input_text, threshold=0.8):
	# Tokenize input
	inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512, padding='max_length')

	# Get model predictions
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.sigmoid(outputs.logits)

	# Filter predictions above threshold
	predicted_codes = [model.config.id2label[i] for i in (predictions > threshold).nonzero()[:, 1]]

	return predicted_codes
	```

	## Training Details

	### Training Data

	- Dataset: MIMIC-3 Corpus
	- Domain: Medical/Clinical text
	- Content: Electronic Health Records (EHR)

	### Training Procedure

	#### Preprocessing
	- Text tokenization
	- Maximum sequence length: 512 tokens
	- Padding to uniform length
	- Potential text normalization techniques

	#### Training Hyperparameters
	- Base Model: BioBERT
	- Training Regime: Fine-tuning
	- Precision: [Specify training precision, e.g., mixed precision]

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data
	- Held-out subset of MIMIC-3 corpus
	- Diverse medical cases and documentation styles

	#### Metrics
	- Precision
	- Recall
	- F1-Score
	- Multi-label classification metrics

	## Environmental Impact

	- Estimated carbon emissions to be calculated
	- Compute details to be specified

	## Technical Specifications

	### Model Architecture
	- Base Model: BioBERT
	- Task: Multi-label ICD-9 Code Classification

	## Citation

	[Citation information to be added when research is published]

	## More Information

	For more details about the model's development, performance, and usage, please contact the model developers.