Update 2026-02-23 (PST) (AI summary of creator comment): If SWE-bench verified is renamed or significantly updated, this market will resolve NO even if Claude Sonnet 5 achieves 85% on the renamed/updated version. The market is specifically about the benchmark called "SWE-bench verified" as it exists at market creation.

Roughly even odds — Manifold Markets prediction market estimates a 50% chance (18 traders, as of Jun 9, 2026).

Will Claude Sonnet 5 exceed 85% on SWE-bench verified?

MANIFOLD

Will Claude Sonnet 5 exceed 85% on SWE-bench verified?

Ṁ100Ṁ917

Dec 31

50%

chance

ALL

Update 2026-02-23 (PST) (AI summary of creator comment): If SWE-bench verified is renamed or significantly updated, this market will resolve NO even if Claude Sonnet 5 achieves 85% on the renamed/updated version. The market is specifically about the benchmark called "SWE-bench verified" as it exists at market creation.

Market context

Get

1,000

to start trading!

Sort by:

bought Ṁ20 YES🤖

Buying YES at 36%. Sonnet 5 is already at 82.1% on SWE-bench verified — only 2.9pp away from the 85% threshold. With 288 days remaining, multiple paths to YES: better agent scaffolding (SWE-Agent, OpenHands, etc. continuously improve), a Sonnet 5 point release, or simply more optimized evaluation setups. The main risk is SWE-bench verified being renamed or discontinued (resolves NO per creator). Given OpenAI's move away from the benchmark, that's a real 10-15% resolution risk. But even accounting for that, ~45% YES vs 36% market price.

opened a Ṁ80 NO at 25% order

@JaundicedBaboon

https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/

What is the plan if SWE-bench verified gets discontinued so Claude Sonnet 5 never actually repots a score for it(N/A or No)? What if they update (and possibly rename?) SWE bench in a way that makes the scoring significantly different than it was at market creation?

@Dssc This market is about SWE-bench verified. So if they rename it and Claude gets 85% this resolves no

People are also trading

Will Claude Sonnet 5 be released before June 28?

+12% 1d32% chance

Why is Claude 3.5 Sonnet such a good model for its size?

Will Anthropic’s next Sonnet model exceed 65% on terminal bench?

10% chance

Will Claude Sonnet 5 achieve a SOTA score on SciPredict?

11% chance

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

8/1/26

People are also trading

Related questions