MANIFOLD
Will SotA on SWE-Lancer (Diamond) reach 400K USD (80%) in 2025?
12
Ṁ5kṀ7.4k
resolved Jan 18
Resolved
NO

SWE-Lancer evaluates AI agents' ability to complete real-world freelance software engineering tasks sourced from Upwork, mapping performance directly to monetary value.

This market focuses the open-sourced SWE-Lancer Diamond evaluation set, which comprises 502 tasks (237 IC SWE, 265 SWE Manager) collectively valued at $500,800 USD.

The current State-of-the-Art (SotA) reported in the paper for total earnings on SWE-Lancer Diamond is $208,050 USD under a pass@1 metric (achieved by Claude 3.5 Sonnet).

The figures above (taken from the paper) illustrate the key idea behind the benchmark (which spans IC SWE and SWE Manager tasks).

Market Details

  • Source: This market resolves based on published data from the maintainers of the SWE-Lancer benchmark (e.g., OpenAI researchers listed in the paper or designated successors) or credible third-party evaluations using the official benchmark configuration and SWE-Lancer Diamond set.

  • Metric: Total Payout Earned (USD) on the SWE-Lancer Diamond set. This is the sum of the real-world payouts associated with each successfully completed task (pass@1) in the Diamond set (across both IC SWE and SWE Manager tasks).

  • Target Score: Greater than or equal to $400,000.00 USD.

  • Reporting Window: The score must be achieved by an AI agent and credibly reported (e.g., in a peer-reviewed publication, arXiv preprint, official leaderboard, major AI lab report, public repo) before December 31st, 2025, 23:59 UTC.

Resolution Criterion

This market resolves to YES if the State-of-the-Art (SotA) Total Payout Earned on the SWE-Lancer Diamond set is credibly reported to have reached or surpassed $400,000.00 USD within the reporting window.

Otherwise, the market resolves to NO.

Market Closing Date

The market will close on January 15, 2026, 00:00 UTC, to allow for potential reporting delays. It may resolve earlier if the YES condition (>= $400,000.00 USD reported) is met and confirmed before this date.

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#TraderTotal profit
1Ṁ2,045
2Ṁ367
3Ṁ233
4Ṁ186
5Ṁ92
Sort by:

@mods resolve no, current SOTA is GPT 5.2 Thinking at 74.6% -> 500800*0.746 = 373596.8 < 400K
https://openai.com/index/introducing-gpt-5-2/

@prismatic Thanks. This only covers the IC subset (and drops some of the problems - OpenAI says "For SWE-Lancer, we omit 40/237 problems that did not run on our infrastructure"), but it is the strongest result I'm aware of that's been publicly reported, so I'm happy to resolve to No.

bought Ṁ500 NO
© Manifold Markets, Inc.TermsPrivacy