🎯 Our Goal & Future Usecases

DualMedBert is an early-stage text classification model built to categorize patient-reported health and drug conditions. It was trained using the UCI Drug Review Dataset, which contains patient reviews sourced directly from drugs.com.

The internet is full of unstructured patient dataβ€”whether on health forums, review sites like drugs.com, or in clinic intake forms. Our goal with DualMedBert is to process this messy, patient-written text to help researchers and analysts:

  • Analyze Adverse Drug Effects: Quickly sort patient reviews to figure out how different demographics are reacting to certain medications.
  • Track Disease Trends: Automatically categorize thousands of forum posts or clinic notes to see what conditions are trending in a specific region or dataset.

Note: These specific analytical use cases are part of our future roadmap. The model is currently under active development.

⚠️ Important Limitations

  • Not for Diagnosis: This model is strictly an analytical tool designed for research and data structuring. It is never to be used for direct medical diagnosis, advice, or patient treatment.
  • Limited Scope: The current version is only trained to recognize 27 specific diseases/conditions. If a text describes a condition outside this list, the model cannot predict it accurately. We plan to expand its capacity in future iterations.

🧠 DualMedBERT: Dual-Teacher Distilled Biomedical Classifier

⚠️ Testing Phase Notice: DualMedBERT is currently in an active testing phase. The present experiments cover 27 disease classes from the UCI Drug Review dataset, which represents a relatively small and focused slice of the clinical NLP landscape. We intend to extend testing across many more disease categories and significantly larger sample sizes in future iterations. The current dataset size is limited, which contributes to mild overfitting observed in the student model. At scale β€” with more diverse classes and substantially more training data β€” we expect the model's generalization ability and real-world reliability to improve considerably. Results reported here reflect this early-stage evaluation.


We present DualMedBERT, a lightweight and reliable text classification framework for disease prediction from patient-reported health conditions. The proposed approach introduces a dual-teacher knowledge distillation pipeline that transfers complementary knowledge from a general-domain language model (BERT-base) and a domain-specific biomedical model (PubMedBERT) into a compact DistilBERT student enhanced with LoRA-based adaptation.

The student model is trained using a combination of focal loss and entropy-weighted dual-teacher distillation, enabling efficient learning under class imbalance while leveraging both linguistic and domain-specific representations. To further improve real-world usability, we incorporate a post-hoc XGBoost-based calibration module that estimates prediction reliability using softmax-derived features.

Experiments on a 27-class disease classification task using patient-reported health data demonstrate that DualMedBERT achieves a Macro F1 of 0.8432 and Accuracy of 84.4% β€” matching or exceeding BERT-base performance β€” while reducing inference latency by ~1.6–1.8Γ— (encoder: 10.13 ms, end-to-end: 11.06 ms). The calibration module achieves an AUROC of 0.8847 with a calibration accuracy of 83.33%, significantly improving confidence estimation without affecting classification performance.

These results show that carefully designed distillation and calibration strategies can yield efficient, accurate, and reliable models suitable for deployment in real-world healthcare-related NLP applications.


πŸ₯ Use Case / Applications

DualMedBERT is designed for real-world disease classification from patient-reported health conditions, where inputs are often unstructured, noisy, and linguistically diverse.

πŸ” Potential Applications

  • Clinical decision support (assistive, not diagnostic)
    Classifying patient-reported symptoms into likely disease categories to assist healthcare professionals.

  • Telemedicine and triage systems
    Rapidly analyzing patient descriptions to prioritize cases or suggest next steps.

  • Health forums and patient platforms
    Automatically categorizing user-reported conditions for better organization and information retrieval.

  • Public health monitoring
    Aggregating and analyzing trends in reported symptoms across populations.


⚠️ Important Note

This model is intended for research and assistive purposes only and should not be used for medical diagnosis or treatment decisions without professional oversight.


πŸ’‘ Why This Matters

Patient-reported health data differs from clinical text:

  • Informal language
  • Symptom descriptions instead of diagnoses
  • Ambiguity and overlap across conditions

DualMedBERT addresses this by combining:

  • General language understanding (BERT)
  • Biomedical domain knowledge (PubMedBERT)
  • Efficient deployment (DistilBERT + LoRA)
  • Reliability estimation (XGBoost calibration)

🧩 Model Architecture

Student Model

Component Detail
Backbone distilbert-base-uncased
LoRA Rank r = 8
LoRA Alpha Ξ± = 32
LoRA Dropout 0.05
LoRA Applied To Layers 2–5
Layer 1 Partially unfrozen
Pooling CLS token + attention pooling
Classifier Head Dense β†’ 27 disease classes
Max Sequence Len 256 tokens

Teachers

Teacher Checkpoint Role
BERT-base google-bert/bert-base-uncased General language understanding
PubMedBERT microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext Biomedical domain knowledge

🧠 Training Method

Dual-Teacher Knowledge Distillation

The total training loss combines knowledge distillation from two teachers with focal classification loss:

L = Ξ± Β· L_KD_BERT + Ξ² Β· L_KD_PubMed + (1 - Ξ± - Ξ²) Β· L_Focal

Where:

  • KD uses two teachers in parallel
  • Teacher weights are determined via entropy-based confidence (adaptive weighting)
  • Ξ± (KD weight β€” BERT teacher): 0.4
  • Ξ² (KD weight β€” PubMedBERT teacher): 0.5
  • KD Temperature (T): 3.5
  • The remaining weight (1 - 0.4 - 0.5 = 0.1) is applied to the focal loss

πŸ“Š Confidence Calibration (XGBoost)

Post-hoc calibrator predicts whether a prediction is likely to be correct, enabling flagging of uncertain predictions.

Features Used (31 total):

Feature Group Details
Softmax probabilities All 27 class-wise softmax outputs
Max probability max(softmax) β€” confidence in top prediction
Entropy Shannon entropy over softmax distribution
Top-2 gap Difference between top-1 and top-2 softmax values
Top-3 sum Sum of top-3 softmax probabilities

πŸ“ˆ Results

Note: These results are from the current testing phase on the UCI Drug Review dataset (27 disease classes). Results may improve significantly with more data and broader disease coverage. Mild overfitting is observed due to limited dataset size.

Classification Performance

Model Macro F1 Accuracy Latency (Encoder) Latency (End-to-End)
BERT-base 0.8333 0.835 ~16–18 ms ~16–18 ms
PubMedBERT 0.8553 0.855 ~16–18 ms ~16–18 ms
DualMedBERT βœ… 0.8432 0.8440 10.13 ms 11.06 ms

DualMedBERT achieves higher Macro F1 than BERT-base while running at ~1.6Γ— lower latency compared to the teacher models.


πŸ” Calibration Performance

Metric Value
Calibration AUROC 0.8847
Calibration Accuracy 83.33%

The XGBoost calibrator reliably detects when the student's prediction is likely to be wrong β€” enabling downstream systems to flag low-confidence outputs for human review.


βš™οΈ Training Details

Hyperparameter Value
Optimizer AdamW
Learning Rate (student) 1.5e-4
Weight Decay 0.1
Epochs 12
Early Stopping Patience 3
KD Temperature (T) 3.5
KD Alpha (BERT weight) 0.4
KD Beta (PubMedBERT weight) 0.5
LoRA Dropout 0.05
Max Sequence Length 256

🏷️ Supported Disease Classes (27)

ID Disease
0 Abnormal Uterine Bleeding
1 Allergic Rhinitis
2 Bacterial Infection
3 Benign Prostatic Hyperplasia
4 Constipation
5 Diabetes, Type 2
6 Endometriosis
7 Erectile Dysfunction
8 GERD
9 Hepatitis C
10 High Blood Pressure
11 High Cholesterol
12 HIV Infection
13 Hyperhidrosis
14 Fibromyalgia
15 Irritable Bowel Syndrome
16 Migraine
17 Migraine Prevention
18 Multiple Sclerosis
19 Osteoarthritis
20 Overactive Bladder
21 Psoriasis
22 Restless Legs Syndrome
23 Rheumatoid Arthritis
24 Sinusitis
25 Urinary Tract Infection
26 Vaginal Yeast Infection

πŸ“‚ Repository Structure

DualMedBert/
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ config.json                  # Full model and training configuration
β”œβ”€β”€ label_map.json               # Class ID β†’ disease name mapping
β”œβ”€β”€ student_weights.pt           # Trained student model weights
β”œβ”€β”€ tokenizer.json               # Student tokenizer
β”œβ”€β”€ tokenizer_config.json        # Tokenizer configuration
β”œβ”€β”€ vocab.txt                    # Vocabulary file
β”œβ”€β”€ special_tokens_map.json      # Special token definitions
β”œβ”€β”€ xgb_calibrator.json          # Trained XGBoost calibration model
β”œβ”€β”€ temperature_scaler.joblib    # Temperature scaling object (post-hoc)
β”œβ”€β”€ bert_teacher/                # Fine-tuned BERT-base teacher
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   └── ...
β”œβ”€β”€ pubmed_teacher/              # Fine-tuned PubMedBERT teacher
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ tokenizer.json
β”‚   └── ...
└── plots/                       # Evaluation and analysis figures
    β”œβ”€β”€ fig1_kd_training_dynamics.png
    β”œβ”€β”€ fig2_model_comparison.png
    β”œβ”€β”€ fig3_per_class_f1.png
    β”œβ”€β”€ fig4_confusion_matrix.png
    β”œβ”€β”€ fig5_calibrator_analysis.png
    β”œβ”€β”€ fig6_loss_decomposition.png
    └── fig_shap_importance.png

⚠️ Important Notes & Limitations

  • Current testing phase: Results are based on a single dataset (UCI Drug Reviews, 27 classes) with limited samples. The model shows mild overfitting attributable to the small dataset size.
  • Planned expansion: We intend to test DualMedBERT across many more disease classes and with significantly larger datasets. Broader data is expected to unlock better generalization and substantially stronger real-world performance.
  • Adaptive teacher weights: Teacher confidence weights showed limited dynamic variation (~0.45 / 0.55) during training, suggesting both teachers contribute fairly consistently across the dataset.
  • Speed–accuracy tradeoff: The model is designed to prioritize speed and reliability while maintaining competitive classification accuracy relative to its teachers.
  • Not for diagnosis: This model is for research and assistive purposes only. It should not be used as a substitute for professional medical judgment.

πŸ“‚ Dataset

UCI Drug Review Dataset (GrÀßer et al., 2018)
Patient-written drug reviews paired with condition labels. Reviews are informal, symptom-rich, and linguistically diverse β€” making this an appropriate benchmark for patient-reported health classification.


πŸ“š Citation

If you use DualMedBERT, please cite the following relevant works:

  • Hinton et al., 2015 β€” Knowledge Distillation: Distilling the Knowledge in a Neural Network
  • Hu et al., 2022 β€” LoRA: Low-Rank Adaptation of Large Language Models
  • Sanh et al., 2019 β€” DistilBERT, a distilled version of BERT
  • Devlin et al., 2018 β€” BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • Gu et al., 2021 β€” Domain-Specific Language Model Pretraining for Biomedical NLU (PubMedBERT)
  • Lin et al., 2017 β€” Focal Loss for Dense Object Detection
  • Chen & Guestrin, 2016 β€” XGBoost: A Scalable Tree Boosting System
  • GrÀßer et al., 2018 β€” Aspect-Based Sentiment Analysis of Drug Reviews (UCI Drug Review Dataset)

🏁 Summary

DualMedBERT demonstrates that a carefully designed dual-teacher distillation pipeline can:

βœ… Outperform BERT-base in Macro F1 (0.8432 vs. 0.8333) on the current test set
βœ… Achieve ~1.6–1.8Γ— lower inference latency (10.13 ms encoder / 11.06 ms end-to-end)
βœ… Provide reliable confidence estimation via XGBoost calibration (AUROC: 0.8847, Accuracy: 83.33%)
⏳ Under active expansion β€” future work will cover more disease classes and larger datasets for improved generalization


Downloads last month
160
Safetensors
Model size
70.1M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for souvik-nlp/DualMedBert

Adapter
(378)
this model

Space using souvik-nlp/DualMedBert 1