Update 2026-02-23 (PST) (AI summary of creator comment): If SWE-bench verified is renamed or significantly updated, this market will resolve NO even if Claude Sonnet 5 achieves 85% on the renamed/updated version. The market is specifically about the benchmark called "SWE-bench verified" as it exists at market creation.
Buying YES at 36%. Sonnet 5 is already at 82.1% on SWE-bench verified — only 2.9pp away from the 85% threshold. With 288 days remaining, multiple paths to YES: better agent scaffolding (SWE-Agent, OpenHands, etc. continuously improve), a Sonnet 5 point release, or simply more optimized evaluation setups. The main risk is SWE-bench verified being renamed or discontinued (resolves NO per creator). Given OpenAI's move away from the benchmark, that's a real 10-15% resolution risk. But even accounting for that, ~45% YES vs 36% market price.
https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/
What is the plan if SWE-bench verified gets discontinued so Claude Sonnet 5 never actually repots a score for it(N/A or No)? What if they update (and possibly rename?) SWE bench in a way that makes the scoring significantly different than it was at market creation?
@Dssc This market is about SWE-bench verified. So if they rename it and Claude gets 85% this resolves no