Before 2028, will any AI model achieve the same or greater benchmarks as o3 high with <= 1 million tokens per question?

10

Ṁ1kṀ3k

resolved Dec 27

Resolved

YES

1H

6H

1D

1W

1M

ALL

Specifically, the key benchmarks here are ARC, Codeforces elo, and Frontier Math score. The relevant scores are 2727 codeforces elo, 87.5% on arc semi-private, and 25.2% on Frontier Math.

The model must achieve these benchmarks while using no more than 1,000,000 reasoning tokens per question on average.

For context, o3 used 5.7B tokens per task to achieve its ARC score. It also scored 75.7% on low compute mode using 33M tokens per task.

https://arcprize.org/blog/oai-o3-pub-breakthrough

Also note that if the final version of o3 has improved or worsened benchmarks the goalposts will not change. The model must beat the benchmarks listed here.

Market context

Technical AI Timelines

IMO Grand Challenge

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ285
2		Ṁ99
3		Ṁ94
4		Ṁ88
5		Ṁ43

People are also trading

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Chatbot Arena: How high will AI score in 2026?

Will AI top level capabilities generally be judged by question and answer benchmarks in 2029?

What will be the best OpenAI-Proof Q&A score by Dec 31, 2026?

Will a publicly known AI model achieve an 80% time horizon that is an 1 hour and 30 minutes by September 2026?

Will the "AI Longbets Turing Test by 2029" market go above 80% by EOY 2026?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will $10,000 worth of AI hardware be able to train a GPT-3 equivalent model in under 1 hour, by EOY 2027?

Sort by:

Should I resolve this n/a? It is clear now that o3 high does not actually use millions of tokens per question and that number referred to consensus @1024 prompting

@JaundicedBaboon i vote yes

@Bayesian I'm just gonna resolve yes. Doing otherwise would feel unfair to the holders. Strictly speaking the thing mentioned in the question did happen.

Though actually since o3 did slightly worse on some of the benchmarks than were announced in December you could argue that it doesn't resolve yet since no model has achieved those scores

filled a Ṁ350 YES at 88% order

I'm not certain that someone will run these evals and report token counts, but in underlying capabilities I'm about 99.5% confident of this.

People are also trading

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Chatbot Arena: How high will AI score in 2026?

Will AI top level capabilities generally be judged by question and answer benchmarks in 2029?

What will be the best OpenAI-Proof Q&A score by Dec 31, 2026?

Will a publicly known AI model achieve an 80% time horizon that is an 1 hour and 30 minutes by September 2026?

Will the "AI Longbets Turing Test by 2029" market go above 80% by EOY 2026?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will $10,000 worth of AI hardware be able to train a GPT-3 equivalent model in under 1 hour, by EOY 2027?

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

In what year will AI achieve a score of 85% or higher on the SimpleBench leaderboard?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

Chatbot Arena: How high will AI score in 2026?

Will AI top level capabilities generally be judged by question and answer benchmarks in 2029?

What will be the best OpenAI-Proof Q&A score by Dec 31, 2026?

Will a publicly known AI model achieve an 80% time horizon that is an 1 hour and 30 minutes by September 2026?

Will the "AI Longbets Turing Test by 2029" market go above 80% by EOY 2026?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will $10,000 worth of AI hardware be able to train a GPT-3 equivalent model in under 1 hour, by EOY 2027?

© Manifold Markets, Inc.•Terms•Privacy