·
AI & ML interests
Neural networks (since 1993), LLMs, AI-based financial analysis, LLM Benchmarks
Recent Activity
Organizations
-
-
-
-
-
-
-
-
-
-
-
view article
Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.
view article
AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...
view article
AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect.
published
an
article
about 2 months ago
view article
AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark
view article
AutoBench Third Run: Revolutionizing LLM Evaluation with Record-Breaking Scale, Accuracy, and a New Home at autobench.org
view article
Introducing Bot Scanner: A "Skyscanner" for LLM answers
view article
AutoBench Run 2 Results are Out! Surprise: Gemini 2.5 Pro is not the Best Affordable Thinking Model
view article
Escape the Benchmark Trap: AutoBench – the Collective-LLM-as-a-Judge System for Evaluating AI models (ASI-Ready!)