Will a large language model beat a super grandmaster playing chess by EOY 2028?

The LLM must not be limited to only playing chess. For example, if an LLM can play chess and hold a conversation, it counts. However, it cannot make use of a non-LLM based chess engine, or non-LLM related computation, like a calculator or a python script.

A super grandmaster has a FIDE Elo rating of 2700 or more.

The super GM must be trying to win the game.

The game format is unspecified. The GM can be blindfold, or play on a board whose moves are relayed to the LLM via standard notation, or anything else.

duplicate, with different resolution criteria, to

The following AI generated updates are not binding and may contain mistakes. I will try to occasionally add clarifications to the description describing the changes caused by user questions.

Update 2025-06-19 (PST) (AI summary of creator comment): The creator has specified that they would count an AI that is like a 'human brain in AI form' for the purposes of this market. This indicates a broad interpretation of what constitutes a 'large language model'.

Update 2025-06-20 (PST) (AI summary of creator comment): The creator has specified which tools and inputs are permitted for the LLM:
- Allowed: A scratchpad for memory and calls to other instances of the same model.
- Not allowed: A board evaluation function or an opening book provided in the prompt/context.

Update 2025-06-20 (PST) (AI summary of creator comment): The creator has stated they lean towards allowing the LLM's output to be constrained to only output valid moves, viewing this as a minor form of assistance.

Update 2025-06-20 (PST) (AI summary of creator comment): The creator has specified that a hybrid model, such as a chimera network, will not count if it contains a subnetwork that is a specialized non-LLM for chess.

Update 2025-06-20 (PST) (AI summary of creator comment): The creator has specified that a model architecture containing a sub-network that is a chess-specialized LLM is allowed. This sub-network can be a transformer specifically trained on chess-related "language" such as PGNs or FENs.

Update 2025-06-20 (PST) (AI summary of creator comment): The creator has specified that in a hybrid model, a sub-network specialized for chess must also understand English. A model where the chess sub-network is so specialized that it does not know English is considered an 'unpleasant edge-case' that may not count.

Update 2025-06-20 (PST) (AI summary of creator comment): The creator has clarified the criteria for hybrid models with specialized sub-networks:
- Any sub-network, even one specialized for chess, must also be able to understand and use a non-chess based human language (e.g., English).

Update 2025-06-20 (PST) (AI summary of creator comment): The creator has specified that the model is not restricted to text-only interactions. As long as the model uses language, other modalities such as audio-to-audio are permitted.

Update 2025-06-20 (PST) (AI summary of creator comment): The creator has provided additional specifications for the model:
- The model can be multimodal (e.g., using visual or audio inputs), it is not restricted to text-only.
- The model is not required to be autoregressive. Models with different architectures or training methods will be considered.

Update 2025-06-21 (PST) (AI summary of creator comment): The creator has provided further clarification on the architecture of qualifying hybrid or Mixture-of-Experts (MoE) models:
- Likely to count: Models where specialized experts are trained together and specialization is emergent.
- Unlikely to count: Systems where models are trained completely separately and a router or classifier directs prompts to a specialized chess model.

Update 2025-06-21 (PST) (AI summary of creator comment): In response to concerns that an LLM could use unlimited time to search the game tree, the creator is now considering imposing time controls on the match. Potential options mentioned include:
- A specific time limit per move (e.g., 5 minutes).
- Letting the super grandmaster decide the time controls for the match.

People are also trading

Related questions