🦷 DentaInstruct-1.2B

DentaInstruct-1.2B is a fine-tuned instruction-following language model designed to assist with dental domain queries. It was trained using clinically focused questions and answers from the miriad/miriad-4.4M dataset, specifically filtered for the Dental & Oral Medicine specialty. The base model used is LiquidAI/LFM2-1.2B, optimized with the Unsloth fine-tuning library.

Model Details

Architecture: LFM2-1.2B (decoder-only transformer)
Base Model: LiquidAI/LFM2-1.2B
Fine-tuning Library: Unsloth
Instruction Format: Chat-style, formatted using apply_chat_template
Trained On: Dental subset of miriad-4.4M
Compute: Trained using Google Colab T4 (free tier)

Benchmark Summary

The model was evaluated on a curated set of dental prompts with rich terminology across endodontics, periodontics, prosthodontics, and oral surgery. Responses were assessed for fluency, accuracy, and domain relevance.

✅ Terminology Handling: Excellent coverage of dental-specific terms
✅ Instruction Following: Clear, context-aware responses
✅ Answer Structure: Consistently professional and clinically coherent
⚠️ Minor Hallucinations: A few outputs demonstrated factual drift in rare cases

Limitations & Warnings

This model was fine-tuned using the MIRIAD dataset, which comes with the following caution:

This model is trained on data that has not been manually reviewed by medical experts. It should not be used for diagnostic purposes or to inform medical decision-making. It is intended for research and educational purposes only.

Not a substitute for professional dental care
Do not use this model for clinical diagnosis or treatment advice

Intended Use

Educational Q&A for dental students
Conversational chatbots focused on oral health
Research on clinical-domain instruction tuning

Citation

If you use this model or parts of it, please consider citing:

@misc{miriad2024,
  title={MIRIAD: Medical Instructional Record with Interactions and Answers Dataset},
  author={Xue, Yutong and others},
  year={2024},
  url={https://huggingface.co/datasets/miriad/miriad-4.4M}
}

Acknowledgements

LiquidAI for the LFM2 model series
Unsloth for training acceleration
MIRIAD authors for the publicly available medical dataset

yasserrmd
/

DentaInstruct-1.2B