Running 575 575 Scaling test-time compute π Enhance math problem solving by scaling test-time compute
Running 552 552 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects