BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks Jun 18, 2024 • 49
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published 6 days ago • 66
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 42
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization Paper • 2505.23387 • Published May 29 • 9
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30 • 261
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents Paper • 2505.23671 • Published May 29 • 4
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation Paper • 2504.07072 • Published Apr 9 • 9
It's the same but not the same: Do LLMs distinguish Spanish varieties? Paper • 2504.20049 • Published Apr 8
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning Paper • 2505.11049 • Published May 16 • 61
view post Post 3210 SmolVLM is now available on PocketPal — you can run it offline on your smartphone to interpret the world around you. 🌍📱And check out this real-time camera demo by @ngxson , powered by llama.cpp:https://github.com/ngxson/smolvlm-realtime-webcamhttps://x.com/pocketpal_ai See translation 3 replies · ❤️ 11 11 😎 1 1 + Reply
view post Post 2667 New launch: See the energy use of chatbot conversations, in real time. =) jdelavande/chat-ui-energyGreat work from @JulienDelavande ! See translation 🔥 8 8 ❤️ 5 5 + Reply