ResearchGym: Evaluating Language Model Agents on Real-World AI Research Paper • 2602.15112 • Published 14 days ago • 20
Jais-2-Family Collection The 2nd generation of the Jais Large Language Models Family • 4 items • Updated 11 days ago • 13
view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 19 days ago • 30
view article Article Community Evals: Because we're done trusting black-box leaderboards over the community +5 27 days ago • 83