view article Article AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org By PeterKruger • 7 days ago • 6
view article Article AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model By PeterKruger • Apr 29 • 6
view article Article Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM By INSAIT-Institute and 2 others • Apr 23 • 59
view post Post 503 AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmarkhttps://huggingface.co/blog/PeterKruger/autobench See translation 👀 2 2 + Reply
view article Article Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) By PeterKruger • Mar 4 • 7
view article Article Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) By PeterKruger • Mar 4 • 7