README.md · ki-medizin/medical_llm_leaderboard at f8a3154cd140bdc2e70aab8c18273be27c3df657

metadata

title: Medical LLM Leaderboard
emoji: 🌎
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - leaderboard
short_description: A Benchmark of Large Language Models in the Clinic

We benchmark 22 LLMs in the clinic across 11 tasks, 7 metrics, 17 datasets, and over 20,000 test samples. We reveal that LLMs are poor clinical decision-makers in multiple complex clinical tasks.

Github: https://github.com/AI-in-Health/ClinicBench/

Paper: https://aclanthology.org/2024.emnlp-main.759.pdf

Please consider citing 📑 our papers if our repository is helpful to your work, thanks sincerely!

@inproceedings{Liu2024ClinicBench,
  title={Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark},
  author={Fenglin Liu, Zheng Li, Hongjian Zhou, Qingyu Yin, Jingfeng Yang, Xianfeng Tang, Chen Luo, Ming Zeng, Haoming Jiang, Yifan Gao, Priyanka Nigam, Sreyashi Nag, Bing Yin, Yining Hua, Xuan Zhou, Omid Rohanian, Anshul Thakur, Lei Clifton, David A. Clifton},
  booktitle={Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2024}
}