What will be the best performance on FrontierMath Tier 4 by December 31st 2025?

Ṁ1kṀ7.7k

resolved Jan 1

100%98.1%

20 - 30%

0.1%

0% - 10%

0.2%

10 - 20%

0.3%

30 - 40%

0.2%

40 - 50%

0.2%

50 - 60%

0.2%

60 - 70%

0.2%

70 - 80%

0.2%

80 - 90%

0.2%

90 - 100%

The best performance by an AI system on FrontierMath Tier 4 as of December 31st 2025. See https://epoch.ai/frontiermath, under the section Tier 4, for results accepted for the purpose of this market. The "performance" is measured in terms of Pass@1 Accuracy.

At market creation (and day of the official announcement of the benchmark), the best model is o4-mini (high), with a score of 6.25%.

See also best performance on FrontierMath Tier 1-3:

Market context

Technical AI Timelines

AI Benchmarks

FrontierMath

AI 2025 Forecasting Survey by AI Digest

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ866
2		Ṁ402
3		Ṁ171
4		Ṁ98
5		Ṁ84

People are also trading

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

61% chance

What will be the best FrontierMath Tier 4 score by Dec 31, 2026?

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

45% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

35% chance

In what year will Al achieve 95% or higher score on the FrontierMath benchmark?

Will Al achieve 95% or higher score on the FrontierMath benchmark before 2030?

81% chance

Before what year will Al achieve 85% or higher score on the FrontierMath benchmark?

In what year will Al achieve 95% or higher score on the FrontierMath benchmark?

2030

Highest Epoch-acknowledged FrontierMath score at EOY2026?

70.2

Will Al achieve 85% or higher score on the FrontierMath benchmark before 2030?

Sort by:

The correct answer is 40%.

@MaxLennartson I think that was for tiers 1-3, not tier 4.

@TimothyJohnson5c16 This was where I was getting my information from.

@MaxLennartson yeah that was tier 1-3. before tier 4 release, ‘frontiermath score’ referred exclusively to the merged score across tiers 1-3

gemini 3 pro gets 19%!!

bought Ṁ50 YES

GPT-5 (high) got one more problem than previous models, for a score of 8.3% (4/48).

My intuition is that you can't get much better results on this benchmark just by scaling current methods and that no one will implement a new method before end of 2025.

I think there's still a fair bit it can be pushed but I agree probably no one will implement a new method before end of 2025. do you think the method that first gets 30% will use a new method?

@Bayesian Yeah. But I didn't do too much research on the questions I just know they are unique from trainable datasets, and require a lot of reasoning steps. I think we need a new method that will help AIs better generalize their learnings and skills from different domains.