AIM-Harvard

university

https://aim.hms.harvard.edu

dbittermanmd

AIM-Harvard

Activity Feed Request to join this org

AI & ML interests

Artificial Intelligence in Medicine (AIM) Program (NLP group/Bitterman lab: https://www.bittermanlab.org/)

Recent Activity

clefourrier authored a paper 20 days ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Luoyu8631 updated a dataset about 1 month ago

AIM-Harvard/PKG_synthetic_patient_cases

shanchen updated a dataset about 1 month ago

AIM-Harvard/multilingual_toxicity_dataset

View all activity

AIM-Harvard's activity

clefourrier

authored a paper 20 days ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published 22 days ago • 17

Luoyu8631

updated a dataset about 1 month ago

AIM-Harvard/PKG_synthetic_patient_cases

Viewer • Updated Nov 22 • 200 • 16

shanchen

updated a dataset about 1 month ago

AIM-Harvard/multilingual_toxicity_dataset

Viewer • Updated Nov 20 • 25k • 494

shanchen

authored a paper about 1 month ago

ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?

Paper • 2411.06469 • Published Nov 10 • 17

daniellebitt

authored a paper about 1 month ago

ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?

Paper • 2411.06469 • Published Nov 10 • 17

shanchen

updated a dataset about 2 months ago

AIM-Harvard/sorrybench

Viewer • Updated Nov 7 • 9.45k • 46

shanchen

authored 2 papers 2 months ago

Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation

Paper • 2409.20385 • Published Sep 30

WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation

Paper • 2410.12722 • Published Oct 16 • 5

clefourrier

authored 2 papers 6 months ago

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

Paper • 2404.05904 • Published Apr 8 • 8

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 185

daniellebitt

authored a paper 6 months ago

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

Paper • 2406.12066 • Published Jun 17 • 8

mingye94

authored a paper 6 months ago

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

Paper • 2406.12066 • Published Jun 17 • 8

gallifantjack

authored a paper 6 months ago

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

Paper • 2406.12066 • Published Jun 17 • 8

NikolajMunch

authored 2 papers 6 months ago

Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias

Paper • 2405.05506 • Published May 9 • 1

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

Paper • 2406.12066 • Published Jun 17 • 8

shanchen

authored 3 papers 6 months ago

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

Paper • 2406.12066 • Published Jun 17 • 8

Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias

Paper • 2405.05506 • Published May 9 • 1

Measuring Pointwise $\mathcal{V}$-Usable Information In-Context-ly

Paper • 2310.12300 • Published Oct 18, 2023 • 1

clefourrier

posted an update 8 months ago

Post

5449

In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences 🩸

It's therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.

This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.

openlifescienceai/open_medical_llm_leaderboard

Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm

clefourrier

posted an update 8 months ago

Post

4430

Contamination free code evaluations with LiveCodeBench! 🖥️

LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date 📅

This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! 🚀

Check it out!

Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard

Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!

AI & ML interests

Recent Activity

Team members 16

AIM-Harvard's activity