README / README.md
SP2001's picture
Update README.md
6ce14d5 verified
|
raw
history blame
1.82 kB

UTAustin-AIHealth

Welcome to UTAustin-AIHealth – a hub dedicated to advancing research in medical AI. This repo contains the MedHallu dataset, which underpins our recent work:

MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models

MedHallu is a rigorously designed benchmark intended to evaluate large language models' ability to detect hallucinations in medical question-answering tasks. The dataset is organized into two distinct splits:

  • pqa_labeled: Contains 1,000 high-quality, human-annotated samples derived from PubMedQA.
  • pqa_artificial: Contains 9,000 samples generated via an automated pipeline from PubMedQA.

Setup Environment

To work with the MedHallu dataset, please install the Hugging Face datasets library using pip:

pip install datasets

How to Use MedHallu

Downloading the Dataset:

from datasets import load_dataset

# Load the 'pqa_labeled' split: 1,000 high-quality, human-annotated samples.
medhallu_labeled = load_dataset("UTAustin-AIHealth/MedHallu", "pqa_labeled")

# Load the 'pqa_artificial' split: 9,000 samples generated via an automated pipeline.
medhallu_artificial = load_dataset("UTAustin-AIHealth/MedHallu", "pqa_artificial")

License

This dataset and associated resources are distributed under the MIT License.

Citations

If you find MedHallu useful in your research, please consider citing our work:

@misc{MedHallu,
  title={MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models},
  author={},
  booktitle={},
  year={2025},
  publisher={}
}

Contact

For further information or inquiries about MedHallu, please reach out at [email protected]