view article Article Open Source AI: A Cornerstone of Digital Sovereignty By frimelle and 1 other β’ 7 days ago β’ 14
view article Article MCP is at a Tipping Point: Here's Why You Should Care By fdaudens β’ 8 days ago β’ 15
BehaviorBox: Automated Discovery of Fine-Grained Performance Differences Between Language Models Paper β’ 2506.02204 β’ Published 16 days ago β’ 1
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! 13 days ago β’ 43
view article Article How to Build an MCP Server with Gradio By abidlabs and 1 other β’ Apr 30 β’ 173
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text β’ 4 items β’ Updated 13 days ago β’ 25
Reward Bench 2 Collection Datasets, spaces, and models for Reward Bench 2 benchmark and paper! β’ 11 items β’ Updated 16 days ago β’ 11
view article Article *Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings By manu and 1 other β’ 17 days ago β’ 24
view article Article AI Policy @π€: Response to the 2025 National AI R&D Strategic Plan By evijit and 2 others β’ 16 days ago β’ 13
view article Article CodeAgents + Structure: A Better Way to Execute Actions By akseljoonas and 1 other β’ 22 days ago β’ 57
view article Article Bigger isn't always better: how to choose the most efficient model for context-specific tasks π±π§πΌβπ» By sasha β’ 22 days ago β’ 21
view article Article Interactive Tools for machine learning, deep learning, and math By Suzana β’ 23 days ago β’ 44