Ascension 0
Needs to wins 3 consecutive runs
Savescumming does not count
The LLM agent is allowed to take notes on its failed attempts and craft hooks that generate a higher informational representation of its current state.
The LLM is not allowed to use/develop a solver to assist in play, though it is permitted to use such systems to aid in post analysis and heuristics crafting
Update 2026-04-26 (PST) (AI summary of creator comment): - If Act 4 / the Heart boss is added to STS2, defeating it will be required for a valid run
The creator's general guidance for their standard of "reliable" is: the LLM should have a >75% chance of achieving a win streak within ~2-3 hours of runtime
The creator may add additional criteria to prevent cheap model brute-forcing of the 3-win streak
Update 2026-04-27 (PST) (AI summary of creator comment): The 3-win streak must be randomly seeded (no duplicated seeds allowed).
Update 2026-04-27 (PST) (AI summary of creator comment): Clarification on what constitutes a solver (which is not allowed):
A solver is a dedicated program that directly assigns an evaluation of a deck/game state
Scripts are allowed, but should generally be restricted to context injection / formatting the game state into a richer representation
The LLM agent must be the primary source of active inference when making decisions (e.g. which card to choose), rather than delegating decisions to a dedicated engine via tool calls
Update 2026-05-03 (PST) (AI summary of creator comment): Prompt engineering restrictions:
The initial prompt must not contain explicit instructions on what actions to perform (e.g. "aim for a poison build" is disallowed)
The initial prompt should focus on providing context about the harness and broad advice on reasoning through the game and learning from mistakes
Any additional messages after the initial prompt must be hands-off, such as: "continue playing" or "reflect on why you lost and try again"
People are also trading
@StatsMan that would be disallowed. The initial prompt must not have explicit instructions on what actions to perform.
The contents of the initial prompt should primarily focus on providing context of the harness and broad advise on how to reason through the game and learn from mistakes.
Any additional messages thereafter should be extremely off hands, such as: "continue playing", "reflect on why you lost and try again", etc
🤔 that being said this policy can easily be circumvented by prompting something like: "look out for OP deck synergies like @poison.md" instead..so in principle no, but in practice I guess so..
@Kingfisher good question - is it allowed to call python at all, for example? Can it build software on the fly during the run?
@Kingfisher a solver would be a dedicated program that directly assigns an evaluation of a deck/game state.
Scripts are allowed, but their usage should generally be restricted to only assist with context injection / formatting the gamestate into a richer representation.
What is desirable is for the LLM agent to be the primary source of active inference when making decisions (like which card to choose) instead of making a tool call to a dedicated engine.
@Quillist
> a solver would be a dedicated program that directly assigns an evaluation of a deck/game state.
Doesn't the LLM itself count as that? If I take game state and feed that into a LLM and use its output to make a decision, that meets this definition - but I don't think that's what you intended. When you get down to it, the line between LLM and "solver" is quite blurry, since you're intending to use the LLM itself as "a solver" per this definition. Can this be made more precise?
@Kingfisher Some examples of 'solver' are programs like:
PioSolver ( a Texas holem solver )
Stockfish ( a chess solver )
For example, a program that explicitly simulates thousands of different deck configurations to calculate their hypothetical performance and use that information to compile an evaluation function that is called during play
@Quillist Totally fair to set the resolution criteria to whatever you want, but I think you should consider making another market in which the above-defined type of "solver" is allowed, because honestly, I think I'd be more impressed if an LLM made a good solver and crushed the game than if it just stumbled through three runs.
Though perhaps I'm just overestimating how difficult it would be to write a solver for StS2.
@NBAP I think STS2 solvers are way easier than expected, the only wrench being balance changes getting in the way of long standing data collection. Calculating average deck damage block energy and other factors is the core of it. Less moving pieces than chess
wins 3 consecutive runs
if they add an act 4, will it require that as well? (Ie an equivalent market for STS1 in 2026 would require 3 heart kills)
@Ziddletwix overall, 3 consecutive runs is a way lower bar than “reliably beat”, so this is a pretty low bar if someone puts the effort in (e.g., even with a 10% win rate, an LLM cranking out runs for a few weeks will easily succeed. A 10% win rate is nontrivial! But if you can teach one the ultra basics of combat, that seems pretty doable even without particularly great overall strategy).
@Ziddletwix if the heart / act 4 boss is introduced then it will be required.
I chose a 3-win streak as the bar because it didn't want to make the resolution criteria prohibitively expensive (ie: a solution exists but it costs like 10$ a run). Requiring more than 10+ attempts would begin to make this market more about costs than ability.
That being said, if someone were to use a cheap model to brute force the 3-streak, it would be spiritually unsatisfying.
I may add additional criterions to prevent such resolution. To give some guidance, my standard for reliable is pretty generous. What I would like to simulate is a situation where I can give the LLM ~2/3 hours to achieve win and it can do so >75% chance
@Dssc save scumming doesn't count
The LLM agent is allowed to take notes on its failed attempts and craft hooks that generate a higher informational representation of its current state.
The LLM is not allowed to use/develop a solver to assist in play, though it is permitted to use such systems for post analysis and heuristics crafting