SWE-smith: Scaling Data for Software Engineering Agents Paper • 2504.21798 • Published Apr 30 • 10 • 1
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Paper • 2310.06770 • Published Oct 10, 2023 • 8
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback Paper • 2306.14898 • Published Jun 26, 2023
DevBench: A Comprehensive Benchmark for Software Development Paper • 2403.08604 • Published Mar 13, 2024 • 2