3 2 2

Peter Kruger PRO

PeterKruger

http://pwk.it

AI & ML interests

Neural networks (since 1993), LLMs, AI-based financial analysis, LLM Benchmarks

Recent Activity

published an article 7 days ago

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

updated a Space 8 days ago

AutoBench/AutoBench-Leaderboard

published an article 3 months ago

Introducing Bot Scanner: A "Skyscanner" for LLM answers

View all activity

Organizations

published an article 7 days ago

Article

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

•

7 days ago

• 6

updated a Space 8 days ago

AutoBench Leaderboard

👀

Multi-run AutoBench leaderboard with historical navigation

published an article 3 months ago

Article

Introducing Bot Scanner: A "Skyscanner" for LLM answers

•

Jun 4

published an article 4 months ago

Article

AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model

•

Apr 29

• 6

published a Space 4 months ago

AutoBench Leaderboard

👀

Multi-run AutoBench leaderboard with historical navigation

upvoted an article 4 months ago

Article

Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM

and 2 others •

Apr 23

• 59

updated a Space 6 months ago

README

😻

updated a model 6 months ago

AutoBench/AutoBench_1.0

Updated Mar 7 • 2

commented on Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) 6 months ago

Nice and fully accurate. Excellent job. Thanks!

posted an update 6 months ago

Post

503

AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmark
https://huggingface.co/blog/PeterKruger/autobench

upvoted an article 6 months ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

•

Mar 4

• 7

liked a Space 6 months ago

AutoBench 1.0 Demo

🐠

Collective-Model-As-Judge LLM Benchmark

liked a model 6 months ago

AutoBench/AutoBench_1.0

Updated Mar 7 • 2

updated a Space 6 months ago

AutoBench 1.0 Demo

🐠

Collective-Model-As-Judge LLM Benchmark

published an article 6 months ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

•

Mar 4

• 7

updated a dataset 6 months ago

AutoBench/AutoBench_Results_20_LLMs

Preview • Updated Mar 4 • 2

published a dataset 6 months ago

AutoBench/AutoBench_Results_20_LLMs

Preview • Updated Mar 4 • 2

published 2 Spaces 6 months ago

README

😻

AutoBench 1.0 Demo

🐠

Collective-Model-As-Judge LLM Benchmark

published a model 6 months ago

AutoBench/AutoBench_1.0

Updated Mar 7 • 2

Peter Kruger PRO

AI & ML interests

Recent Activity

Organizations

PeterKruger's activity

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

AutoBench Leaderboard

Introducing Bot Scanner: A "Skyscanner" for LLM answers

AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model

AutoBench Leaderboard

Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM

README

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

AutoBench 1.0 Demo

AutoBench 1.0 Demo

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

README

AutoBench 1.0 Demo