STAIR

university

https://stair.cs.stanford.edu/

stai_research

stair-lab

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

sangttruong authored a paper 2 days ago

ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code

sangttruong authored a paper 2 days ago

Reliable and Efficient Amortized Model-based Evaluation

yuhengtu authored a paper 2 days ago

Reliable and Efficient Amortized Model-based Evaluation

View all activity

sangttruong

authored 2 papers 2 days ago

ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code

Paper • 2506.02314 • Published 22 days ago

Reliable and Efficient Amortized Model-based Evaluation

Paper • 2503.13335 • Published Mar 17

yuhengtu

authored a paper 2 days ago

Reliable and Efficient Amortized Model-based Evaluation

Paper • 2503.13335 • Published Mar 17

yuhengtu

updated 2 datasets 4 days ago

stair-lab/reeval

Viewer • Updated 4 days ago • 5.69M • 53

stair-lab/monkey_queries

Preview • Updated 4 days ago • 232

yuhengtu

updated a dataset 5 days ago

stair-lab/helm_display_validity

Viewer • Updated 5 days ago • 997 • 16

yuhengtu

updated a dataset 8 days ago

stair-lab/platinum_detect

Updated 8 days ago • 99

sangttruong

updated a collection 8 days ago

Reliable and Efficient Amortized Model-Based Evaluation

Collection

Datasets and Models for the REEval project • 24 items • Updated 8 days ago

yuhengtu

updated a dataset 12 days ago

stair-lab/fantastic-bugs

Viewer • Updated 12 days ago • 402 • 180

yuhengtu

published a dataset 13 days ago

stair-lab/helm_display_validity

Viewer • Updated 5 days ago • 997 • 16

yuhengtu

updated a dataset 14 days ago

stair-lab/monkey_3d_data

Updated 14 days ago • 14

yuhengtu

published a dataset 14 days ago

stair-lab/monkey_3d_data

Updated 14 days ago • 14

sanmikoyejo

authored a paper about 2 months ago

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 70

AI & ML interests

Recent Activity

Team members 9

stair-lab's activity