MANIFOLD
Top Multi-SWE-bench score in 2025?
24
Ṁ10kṀ45k
resolved Jan 30
Resolved
20 - 39%
100%87%
20 - 39%
0.9%
0 - 19%
7%
40 - 59%
3%
60 - 79%
1.8%
80 - 100%

SWE-bench is a great AI benchmark, but it is Python-only. Multi-SWE-bench is the same thing with multiple programming languages: C, C++, Java, JavaScript, TypeScript, Go, Rust.

Claude 3.7 Sonnet based agent achieved a score of 19% in 2025-03-29, which is currently the best score. The score will be rounded. ("Rounding half up" to be exact, see Rounding.)

The resolution will be primarily from the official leaderboard, but other announcements from reputable organizations will be considered.

See also /SG/top-swebench-verified-score-in-2025

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#TraderTotal profit
1Ṁ4,452
2Ṁ1,266
3Ṁ863
4Ṁ450
5Ṁ319
Sort by:

@SanghyeonSeo this can either N/A or resolve 20-39% right? (No updates, but MopenHands + Gemini-2.5-Pro is listed at 21.62)

It stopped being measured

Have you tried gemini 2.5 pro experimental on it yet?

@ian The leaderboard on the website shows something with Gemini 2.5 Pro at 21.62%:

https://multi-swe-bench.github.io/#/

(Not sure what Mopenhands is...)

© Manifold Markets, Inc.TermsPrivacy