4 6 2

Peter Kruger PRO

PeterKruger

http://pwk.it

AI & ML interests

Neural networks (since 1993), LLMs, AI-based financial analysis, LLM Benchmarks

Recent Activity

updated a Space 7 days ago

AutoBench/AutoBench-Leaderboard

upvoted an article 9 days ago

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

published an article 9 days ago

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

View all activity

Organizations

published an article 9 days ago

Article

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

9 days ago

•

published an article 16 days ago

Article

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

16 days ago

•

published an article 28 days ago

Article

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

28 days ago

•

published an article about 2 months ago

Article

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

Oct 29

•

published an article 4 months ago

Article

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

Aug 20

•

published an article 7 months ago

Article

Introducing Bot Scanner: A "Skyscanner" for LLM answers

Jun 4

published an article 8 months ago

Article

AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model

Apr 29

•

published an article 10 months ago

Article

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)

Mar 4

•

Peter Kruger PRO

AI & ML interests

Recent Activity

Organizations

PeterKruger's activity

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org

Introducing Bot Scanner: A "Skyscanner" for LLM answers

AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model

Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)