medical_llm_leaderboard

Sleeping

fenglinliu commited on Nov 10, 2024

Commit

156e0cd

verified ·

1 Parent(s): 6a83297

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Shopping MMLU Leaderboard
 emoji: 🌎
 colorFrom: blue
 colorTo: green
@@ -10,21 +10,23 @@ pinned: true
 license: apache-2.0
 tags:
 - leaderboard
-short_description: 'Massive Multi-Task LLM Benchmark for Online Shopping'
 ---
-In this leaderboard, we display evaluation results obtained with Shopping MMLU. The space provides an overall leaderboard, consisting of 4 main online shopping skills:
-- Shopping Concept Understanding
-- Shopping Knowledge Reasoning
-- User Behavior Alignment
-- Multi-lingual Abilities
-Github: https://github.com/KL4805/ShoppingMMLU
-Report: https://arxiv.org/abs/2410.20745
-Please consider to cite the report if the resource is useful to your research:
 ```BibTex
 ```

 ---
+title: Medical LLM Leaderboard
 emoji: 🌎
 colorFrom: blue
 colorTo: green
 license: apache-2.0
 tags:
 - leaderboard
+short_description: A Benchmark of Large Language Models in the Clinic
 ---
+We benchmark 22 LLMs in the clinic across 11 tasks, 7 metrics, 17 datasets, and over 20,000 test samples.
+We reveal that LLMs are poor clinical decision-makers in multiple complex clinical tasks.
+Github: https://github.com/AI-in-Health/ClinicBench/
+Paper: https://aclanthology.org/2024.emnlp-main.759.pdf
+Please consider citing 📑 our papers if our repository is helpful to your work, thanks sincerely!
 ```BibTex
+@article{zhou2023survey,
+  title={A Survey of Large Language Models in Medicine: Progress, Application, and Challenge},
+  author={Hongjian Zhou, Fenglin Liu, Boyang Gu, Xinyu Zou, Jinfa Huang, Jinge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton},
+  journal={arXiv preprint arXiv:2311.05112},
+  year={2023}
+}
 ```