Background:
SimpleBench is a 200‑item multiple‑choice test designed to probe everyday human reasoning that still eludes frontier LLMs, including spatio‑temporal reasoning, social intelligence and linguistic “trick” questions. Unlike most other benchmarks, humans still outperform AI on the SimpleBench.
State of play:
• Human reference accuracy: 83.7 %
• 2025 AI accuracy (Gemini 2.5 Pro): 62.4 %
Why this milestone matters
Everyday reasoning: Passing SimpleBench would indicate that LLMs can handle commonsense scenarios that remain brittle today.
Benchmark head‑room: Unlike MMLU, SimpleBench has not yet been “solved”, so it is a useful yard‑stick for progress to compare AI to humans.
Resolution Criteria:
This market resolves to the year‑bracket in which a fully automated AI system first achieves ≥ 85% average accuracy on SimpleBench (ALL metric), subject to all of the following:
Verification – The claim must be confirmed by either
a peer‑reviewed paper on arXiv
a public leaderboard entry on SimpleBench Official Website.
Compute resources – Unlimited.
Fine Print: Clarification:
If the milestone is first reached in a given year, only the earliest bracket that still contains that year resolves YES; all other brackets resolve NO.
Example: Should an AI system hit 85 % on SimpleBench in 2025, only “Before 2026” wins, all other brackets resolve NO.Cut‑off – If no AI model reaches 85 % by 31 Dec 2033, the market resolves to “Not Applicable”.