ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code Paper • 2506.02314 • Published 22 days ago
Reliable and Efficient Amortized Model-Based Evaluation Collection Datasets and Models for the REEval project • 24 items • Updated 8 days ago