By what factor will the cost for SotA SWE-agents drop from 2024 to 2025?
9
1kṀ1072
resolved Sep 1
100%31%
<250x
7%
<2x
11%
<10x
14%
<50x
37%
>=250x

Algorithmic progress can be measured by reduction in cost to achieve equivalent performance. SWE-bench-lite is a popular benchmark for measuring scaffolded-LLM SWE capabilities.

By what factor will the cost of SWE-bench-lite SoTA drop between mid 2024-2025? Mid-2024 SotA is 43% costing $2,700 (per the devs), so this question will resolve Yes on the answer which most tightly bounds the reduction in cost to achieve 43% on July 1, 2025.

E.g. if in June 2025, 43% on SWE-lite costs $500 then that'd be a 5.4x reduction and the question would resolve (2) "<10x".

  • Update 2025-08-31 (PST) (AI summary of creator comment): - Revised baseline (mid-2024): Uses Alibaba's Lingma Agent with Claude 3.5 Sonnet at 38% costing about $2.18/problem, replacing the previously stated $2,700 for 43% on SWE-bench-lite.

    • 2025 cost will be inferred via proxies (not strictly “43% on SWE-bench-lite on July 1, 2025”): e.g., GPT-5 Nano at about $0.01/instance on full SWE-bench and model price changes like Qwen3 a3b coder ($0.08/output token) vs Sonnet 3.5 ($15/output token).

    • Estimated reduction is ~50x–250x; absent compelling contrary evidence, the market will be resolved to "<250x" soon.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ179
2Ṁ19
3Ṁ11
4Ṁ8
5Ṁ5
Sort by:

This resolution is a bit more speculative than I would have liked, but here's my current guess:

https://www.swebench.com/ shows mid-2024 Sota was

Alibaba's Lingma Agent at 33%--but looking at their paper they spent $2.18 per problem to achieve 38% with 3.5 Sonnet. So that's our mid-2024 baseline cost.

Now it gets a bit trickier for 2025, but a ballpark (outside our time horizon) is that GPT-5 Nano costs solves 3X% of the (probably harder) complete SWE-bench at $0.01 per instance. That's a roughly 200x reduction.

I believe qwen3 a3b coder was released after the cutoff and costs 0.08/m output token, whereas 3.5 Sonnet costed $15 again roughly a 200x drop. I'm not clear on what the exact scaffold+model sota was in June 2025, but I think that gives us strong evidence that this was around 200x.

So I'm very unsure about the exact value, but my 80% CI is roughly (50x, 250x). Unless anyone provides compelling evidence against this, I'll resolve to <250x next week.

@JacobPfau holy shit, that's an insanely huge improvement if true O_O

@JacobPfau very cool, thanks for doing the work here

Comment hidden
© Manifold Markets, Inc.TermsPrivacy