How much cheaper to use will o3-equivalent or better models get before 2026?
22
1kṀ3920
Dec 31
90%
≥2x
75%
≥5x
70%
≥10x
53%
≥30x
30%
≥100x

Any model with publicly known benchmark scores and inference costs goes, not just OpenAI's o series.

I will consider a model to be "o3-equivalent or better" if it scores ≥25% on FrontierMath (o3 scored 25.2%) and performs similarly on other benchmarks.

(Note that o3's exact inference costs in the configuration used for benchmarking are currently unknown IIUC, though this market description will be updated with exact figures if they become public. This market can still resolve even without exact figures if e.g. OpenAI announce an o4 that's "10x cheaper" for roughly the same performance.)

Get
Ṁ1,000
to start trading!


Sort by:
3mo

Double scaling law, they just need to run the RL/CoT training for longer to get better perf with a more efficient model.

3mo

o4-mini might be cracked

this may be hard to resolve because the inference costs for specific benchmark performances or tasks can vary so much.

bought Ṁ10 YES3mo

@JoshYou as a concrete example, let's say o4 costs the same per-token (for simplicity) and can achieve 25% on FrontierMath with 1/10 as many tokens as o3 did, but requires 1/5 as many tokens to match o3 on ARC-AGI.

What's worse, those ratios probably vary a lot depending on the performance thresholds with a given benchmark. For example, it's over 100x more expensive to get 88% on ARC-AGI with o3 than it is to get 76% on ARC-AGI with o3. So it could turn out that o4 is 5x cheaper than o3 at the 76% threshold, but over 100x cheaper at the 88% threshold.

@JoshYou Hmmm... Yeah, there might be a relatively high chance of this resolving N/A when you put it that way, but I'll do what I can when the time comes.

What is this?

What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
ṀWhy use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules