Will Transformer-Based LLMs Make Up ≥75% of Parameters in the Top General AI by 2030?

As of December 31, 2029,will a large language model (LLM)—defined as a transformer-based, next-token prediction model—comprise at least 75% of the activation parameter count of the most capable, publicly-known general-purpose AI system?

Definitions:

LLM: A model whose main pre-training objective is next-token prediction and whose architecture is based primarily on transformers (including dense or sparse, MoE, or similar variants).
Activation parameters: The total number of trainable weights that are loaded in memory during a maximum-capability inference pass. For MoE models, count the union of all experts that could be active in any inference pass (not just the average active subset).
≥ 75% rule: If one or more LLMs, combined, comprise at least 75% of all activation parameters (across all neural modules, including vision, planning, and others), the criterion is met.
Most capable general-purpose AI: The system, as of December 31, 2029, that demonstrates the highest publicly documented cross-domain performance (as measured by recognized AGI or multitask benchmarks) or is acknowledged as top-tier by a consensus.
Backbone: The neural component(s) that provide broad reasoning and general knowledge. Symbolic planners or retrieval databases without trainable weights are not counted.
Publicly-known: The system must be openly released or credibly leaked with reproducible technical details, such as model card, parameter count, architecture, or benchmark results.

Edge-case clarifications:

Mixture-of-Experts (MoE) LLMs: All possible experts count toward the parameter total, even if only a subset are active per token.
Retrieval-Augmented Generation (RAG) or external databases: Non-parametric resources (e.g., vector DBs) are ignored for parameter counting; only neural weights matter.
Controller LLM plus a non-LLM core (e.g., physics simulator): If the non-LLM neural weights exceed 25%, the criterion is not met.
Systems distilled from an LLM into a non-transformer architecture (e.g., Mamba, RWKV): Does not count, even if originally based on LLMs.
Neuro-symbolic or hybrid systems: Only count neural parameters. If LLMs make up less than 75%, the answer is “No.”
Multiple LLM agents: Combine all LLM weights for the total.
Quantized or adapted LLMs: Count the original trainable weights, not their precision.
Leaked systems without parameter evidence: If parameter count cannot be established, the answer is “No” (burden of proof on “Yes”).

People are also trading

Related questions