Ascension 0 Needs to wins 3 consecutive runs Savescumming does not count The LLM agent is allowed to take notes on its failed attempts and craft hooks that generate a higher informational representation of its current state. The LLM is not allowed to use/develop a solver to assist in play, though it is permitted to use such systems to aid in post analysis and heuristics crafting Update 2026-04-26 (PST) (AI summary of creator comment): - If Act 4 / the Heart boss is added to STS2, defeating it will be required for a valid run The creator's general guidance for their standard of "reliable" is: the LLM should have a >75% chance of achieving a win streak within ~2-3 hours of runtime The creator may add additional criteria to prevent cheap model brute-forcing of the 3-win streak Update 2026-04-27 (PST) (AI summary of creator comment): The 3-win streak must be randomly seeded (no duplicated seeds allowed). Update 2026-04-27 (PST) (AI summary of creator comment): Clarification on what constitutes a solver (which is not allowed): A solver is a dedicated program that directly assigns an evaluation of a deck/game state Scripts are allowed, but should generally be restricted to context injection / formatting the game state into a richer representation The LLM agent must be the primary source of active inference when making decisions (e.g. which card to choose), rather than delegating decisions to a dedicated engine via tool calls Update 2026-05-03 (PST) (AI summary of creator comment): Prompt engineering restrictions: The initial prompt must not contain explicit instructions on what actions to perform (e.g. "aim for a poison build" is disallowed) The initial prompt should focus on providing context about the harness and broad advice on reasoning through the game and learning from mistakes Any additional messages after the initial prompt must be hands-off, such as: "continue playing" or "reflect on why you lost and try again"

Roughly even odds — Manifold Markets prediction market estimates a 42% chance (63 traders, as of May 6, 2026).

LLM agent is able to reliably beat Slay the Spire 2 before EOY 2027

MANIFOLD

LLM agent is able to reliably beat Slay the Spire 2 before EOY 2027

Ṁ10kṀ9.5k

2027

42%

chance

ALL

Ascension 0

Needs to wins 3 consecutive runs

Savescumming does not count

The LLM agent is allowed to take notes on its failed attempts and craft hooks that generate a higher informational representation of its current state.

The LLM is not allowed to use/develop a solver to assist in play, though it is permitted to use such systems to aid in post analysis and heuristics crafting

Update 2026-04-26 (PST) (AI summary of creator comment): - If Act 4 / the Heart boss is added to STS2, defeating it will be required for a valid run
- The creator's general guidance for their standard of "reliable" is: the LLM should have a >75% chance of achieving a win streak within ~2-3 hours of runtime
- The creator may add additional criteria to prevent cheap model brute-forcing of the 3-win streak

Update 2026-04-27 (PST) (AI summary of creator comment): The 3-win streak must be randomly seeded (no duplicated seeds allowed).

Update 2026-04-27 (PST) (AI summary of creator comment): Clarification on what constitutes a solver (which is not allowed):
- A solver is a dedicated program that directly assigns an evaluation of a deck/game state
- Scripts are allowed, but should generally be restricted to context injection / formatting the game state into a richer representation
- The LLM agent must be the primary source of active inference when making decisions (e.g. which card to choose), rather than delegating decisions to a dedicated engine via tool calls

Update 2026-05-03 (PST) (AI summary of creator comment): Prompt engineering restrictions:
- The initial prompt must not contain explicit instructions on what actions to perform (e.g. "aim for a poison build" is disallowed)
- The initial prompt should focus on providing context about the harness and broad advice on reasoning through the game and learning from mistakes
- Any additional messages after the initial prompt must be hands-off, such as: "continue playing" or "reflect on why you lost and try again"

Market context

Technology

Technical AI Timelines

Gaming

LLMs

Get

1,000

to start trading!

People are also trading

By 2028 will an assisted LLM beat the Ender Dragon?

81% chance

Will any LLM produce a reasonable poker simulation, as judged by Nate Silver, by the end of 2028?

54% chance

There will be one LLM/AI that is at least 10x better than all others in 2027

17% chance

Sort by:

Can the initial prompt be engineered, such as “aim for a poison build” etc?

@StatsMan that would be disallowed. The initial prompt must not have explicit instructions on what actions to perform.

The contents of the initial prompt should primarily focus on providing context of the harness and broad advise on how to reason through the game and learn from mistakes.

Any additional messages thereafter should be extremely off hands, such as: "continue playing", "reflect on why you lost and try again", etc

🤔 that being said this policy can easily be circumvented by prompting something like: "look out for OP deck synergies like @poison.md" instead..so in principle no, but in practice I guess so..

The LLM is not allowed to use/develop a solver to assist in play

Can you give a precise technical definition of what "a solver" is here? This seems ill-defined.

@Kingfisher good question - is it allowed to call python at all, for example? Can it build software on the fly during the run?

@Kingfisher a solver would be a dedicated program that directly assigns an evaluation of a deck/game state.

Scripts are allowed, but their usage should generally be restricted to only assist with context injection / formatting the gamestate into a richer representation.

What is desirable is for the LLM agent to be the primary source of active inference when making decisions (like which card to choose) instead of making a tool call to a dedicated engine.

@Quillist
> a solver would be a dedicated program that directly assigns an evaluation of a deck/game state.

Doesn't the LLM itself count as that? If I take game state and feed that into a LLM and use its output to make a decision, that meets this definition - but I don't think that's what you intended. When you get down to it, the line between LLM and "solver" is quite blurry, since you're intending to use the LLM itself as "a solver" per this definition. Can this be made more precise?

@Kingfisher Some examples of 'solver' are programs like:

PioSolver ( a Texas holem solver )

Stockfish ( a chess solver )

For example, a program that explicitly simulates thousands of different deck configurations to calculate their hypothetical performance and use that information to compile an evaluation function that is called during play

@Quillist Totally fair to set the resolution criteria to whatever you want, but I think you should consider making another market in which the above-defined type of "solver" is allowed, because honestly, I think I'd be more impressed if an LLM made a good solver and crushed the game than if it just stumbled through three runs.

Though perhaps I'm just overestimating how difficult it would be to write a solver for StS2.

@NBAP I think STS2 solvers are way easier than expected, the only wrench being balance changes getting in the way of long standing data collection. Calculating average deck damage block energy and other factors is the core of it. Less moving pieces than chess

wins 3 consecutive runs

if they add an act 4, will it require that as well? (Ie an equivalent market for STS1 in 2026 would require 3 heart kills)

bought Ṁ1 YES

@Ziddletwix overall, 3 consecutive runs is a way lower bar than “reliably beat”, so this is a pretty low bar if someone puts the effort in (e.g., even with a 10% win rate, an LLM cranking out runs for a few weeks will easily succeed. A 10% win rate is nontrivial! But if you can teach one the ultra basics of combat, that seems pretty doable even without particularly great overall strategy).

@Ziddletwix if the heart / act 4 boss is introduced then it will be required.

I chose a 3-win streak as the bar because it didn't want to make the resolution criteria prohibitively expensive (ie: a solution exists but it costs like 10$ a run). Requiring more than 10+ attempts would begin to make this market more about costs than ability.

That being said, if someone were to use a cheap model to brute force the 3-streak, it would be spiritually unsatisfying.

I may add additional criterions to prevent such resolution. To give some guidance, my standard for reliable is pretty generous. What I would like to simulate is a situation where I can give the LLM ~2/3 hours to achieve win and it can do so >75% chance

@Quillist Does it count if it relies on savescumming?

Also to what extent are harnesses allowed?

@Dssc save scumming doesn't count

The LLM agent is allowed to take notes on its failed attempts and craft hooks that generate a higher informational representation of its current state.

The LLM is not allowed to use/develop a solver to assist in play, though it is permitted to use such systems for post analysis and heuristics crafting

bought Ṁ200 NO

@Quillist I would probably include restrictions against duplicated seed and the like

@Magnify yes, for clarity the 3-win streak must be randomly seeded

People are also trading

By 2028 will an assisted LLM beat the Ender Dragon?

81% chance

Will any LLM produce a reasonable poker simulation, as judged by Nate Silver, by the end of 2028?

54% chance

There will be one LLM/AI that is at least 10x better than all others in 2027

17% chance

People are also trading

People are also trading

Related questions