SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks Paper • 2506.10954 • Published 26 days ago • 51
Running on CPU Upgrade 13.3k 13.3k Open LLM Leaderboard 🏆 Track, rank and evaluate open LLMs and chatbots
RMCBench: Benchmarking Large Language Models' Resistance to Malicious Code Paper • 2409.15154 • Published Sep 23, 2024