view article Article Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2. 17 days ago • 1
view article Article Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2. 17 days ago • 1
view article Article AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral... 24 days ago • 3
view article Article AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral... 24 days ago • 3
view article Article AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect. Nov 28, 2025 • 1
view article Article AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect. Nov 28, 2025 • 1
view article Article AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark Oct 29, 2025 • 4
view article Article AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark Oct 29, 2025 • 4
view article Article AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org Aug 20, 2025 • 6
view article Article AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model Apr 29, 2025 • 6
view article Article Announcing MamayLM, an efficient state-of-the-art Ukrainian LLM Apr 23, 2025 • 62
view post Post 526 AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmarkhttps://huggingface.co/blog/PeterKruger/autobench See translation 👀 2 2 + Reply
view article Article Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!) Mar 4, 2025 • 9