What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?

8

Ṁ1.7kṀ2.9k

resolved Jan 1

100%90%

0 - 10

4%

11 - 20

0.8%

21 - 30

0.7%

31 - 40

0.7%

41 - 50

0.7%

51 - 60

0.7%

61 - 70

0.7%

71 - 80

0.7%

81 - 90

0.8%

91 - 100

ZeroBench is a benchmark for visual reasoning, introduced by Roberts et al. in "ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models" (https://arxiv.org/abs/2502.09696).

This market will use the variant of the benchmark frozen one week after the initial release (following the public benchmark red-teaming stage to identify flawed/ambiguous questions).

The temperature used for the 5/5 reliability evaluation will be the default setting provided by each LLM API provider. In cases where this default is ambiguous to determine, we will default to a temperature of 0.7.

Market context

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ439
2		Ṁ284
3		Ṁ229
4		Ṁ211
5		Ṁ64

People are also trading

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

What will be the best GSOBench score by Dec 31, 2026?

When will an AI lab report 50% or better result on ZeroBench?

What will be the highest score on the SWE-bench pro private set before 2027?

What will be the highest Epoch Capabilities Index score in 2026?

What will be the best Remote Labor Index score by Dec 31, 2026?

Which LLM Maker will hold the top Safety Score for Spiral-Bench on https://eqbench.com/spiral-bench.html on Jan1, 2027?

Top SWE-Bench Pro score by Jan 1, 2027?

What will be the best performance on EnigmaEval by December 31st 2026?

In what year will AI achieve a score of 95% or higher on the PhysBench leaderboard?

Sort by:

@creator can resolve

bought Ṁ150 YES

Nov 19, 2025
Gemini 3
"pass@5: 19% (prev SOTA 10%)
5/5 reliability: 5% (prev 3%)"
https://x.com/JRobertsAI/status/1991163723436663125?s=20

As of May 24th 2025, Claude 4 Opus is the new SotA:

https://x.com/JRobertsAI/status/1926325748303872203

4% Pass@1

As of March 28th 2025, Gemini 2.5 Pro is the new SotA: https://x.com/JRobertsAI/status/1905577784300183653

3% pass@1

5% pass@5

1% 5/5 reliability

People are also trading

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

What will be the best GSOBench score by Dec 31, 2026?

When will an AI lab report 50% or better result on ZeroBench?

What will be the highest score on the SWE-bench pro private set before 2027?

What will be the highest Epoch Capabilities Index score in 2026?

What will be the best Remote Labor Index score by Dec 31, 2026?

Which LLM Maker will hold the top Safety Score for Spiral-Bench on https://eqbench.com/spiral-bench.html on Jan1, 2027?

Top SWE-Bench Pro score by Jan 1, 2027?

What will be the best performance on EnigmaEval by December 31st 2026?

In what year will AI achieve a score of 95% or higher on the PhysBench leaderboard?

Related questions

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

What will be the best GSOBench score by Dec 31, 2026?

When will an AI lab report 50% or better result on ZeroBench?

What will be the highest score on the SWE-bench pro private set before 2027?

What will be the highest Epoch Capabilities Index score in 2026?

What will be the best Remote Labor Index score by Dec 31, 2026?

Which LLM Maker will hold the top Safety Score for Spiral-Bench on https://eqbench.com/spiral-bench.html on Jan1, 2027?

Top SWE-Bench Pro score by Jan 1, 2027?

What will be the best performance on EnigmaEval by December 31st 2026?

In what year will AI achieve a score of 95% or higher on the PhysBench leaderboard?

© Manifold Markets, Inc.•Terms•Privacy