4 6 2

Peter Kruger PRO

PeterKruger

http://pwk.it

AI & ML interests

Neural networks (since 1993), LLMs, AI-based financial analysis, LLM Benchmarks

Recent Activity

updated a Space 17 days ago

AutoBench/AutoBench-Leaderboard

upvoted an article 19 days ago

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

published an article 19 days ago

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

View all activity

Organizations

updated a Space 17 days ago

AutoBench Leaderboard

👀

Multi-run AutoBench leaderboard with historical navigation

upvoted an article 19 days ago

Article

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

19 days ago

•

published an article 19 days ago

Article

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

19 days ago

•

upvoted an article 26 days ago

Article

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

26 days ago

•

published an article 26 days ago

Article

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

26 days ago

•

upvoted an article about 1 month ago

Article

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

Nov 28, 2025

•

published an article about 1 month ago

Article

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

Nov 28, 2025

•

New activity in AutoBench/AutoBench-Leaderboard 2 months ago

Thanks for your leaderboard :)

🚀 ❤️ 1

#1 opened 2 months ago by

zhiminy

upvoted an article 2 months ago

Article

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

Oct 29, 2025

•

published an article 2 months ago

Article

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

Oct 29, 2025

•

published an article 5 months ago

Article

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

Aug 20, 2025

•

published an article 7 months ago

Article

Introducing Bot Scanner: A "Skyscanner" for LLM answers

Jun 4, 2025

published an article 8 months ago

Article

AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model

Apr 29, 2025

•

published a Space 8 months ago

AutoBench Leaderboard

👀

Multi-run AutoBench leaderboard with historical navigation

upvoted an article 8 months ago

Article

Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM

Apr 23, 2025

•

updated a Space 10 months ago

README

😻

updated a model 10 months ago

AutoBench/AutoBench_1.0

Updated Mar 7, 2025 • 2

commented on Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) 10 months ago

Nice and fully accurate. Excellent job. Thanks!

posted an update 10 months ago

Post

526

AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmark
https://huggingface.co/blog/PeterKruger/autobench

upvoted an article 10 months ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

Mar 4, 2025

•

Peter Kruger PRO

AI & ML interests

Recent Activity

Organizations

PeterKruger's activity

AutoBench Leaderboard

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

Thanks for your leaderboard :)

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

Introducing Bot Scanner: A "Skyscanner" for LLM answers

AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model

AutoBench Leaderboard

Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM

README

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)