Will an LLM agent complete >50% of the lab tasks on the Factorio Learning Environment benchmark in 2025?

Ṁ103Ṁ518

resolved Jan 1

Resolved

YES

ALL

Large Language Models (LLMs) are rapidly saturating existing benchmarks, necessitating new open-ended evaluations. We introduce the Factorio Learning Environment (FLE), based on the game of Factorio, that tests agents in long-term planning, program synthesis, and resource optimization.

https://jackhopkins.github.io/factorio-learning-environment/

As of the time of this market's creation, Claude 3.5 Sonnet tops the leaderboard with a success rate of 21.9%. This market resolves YES if the leaderboard displays an entry with >50% success before the end of 2025, NO otherwise. Claims of higher success rates won't count for resolution unless they're displayed on the leaderboard. Autonomous systems that are not based on an LLM agent framework also won't count for resolution.

Update 2025-12-31 (PST) (AI summary of creator comment): The creator has clarified that results shown on https://jackhopkins.github.io/factorio-learning-environment/versions/0.3.0.html will count for resolution purposes, even though the main leaderboard link has not been updated. The market will follow the spirit of the question rather than requiring the specific leaderboard page mentioned in the original description to be updated.

Market context

Technology

Technical AI Timelines

OpenAI

LLMs

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ91
2		Ṁ83
3		Ṁ23

3 Comments

10 Holders

15 Trades

Sort by:

@creator can I get clarification on how literally you are interpreting your terms?

https://jackhopkins.github.io/factorio-learning-environment/leaderboard/ appears that it is no longer being updated.

https://jackhopkins.github.io/factorio-learning-environment/versions/0.3.0.html

shows that the 50% benchmark ahs been cleared. So thoughts?

Factorio Learning Environment Leaderboard

@MRME Good catch. I'll follow the spirit of the question and say that the results shown on https://jackhopkins.github.io/factorio-learning-environment/versions/0.3.0.html count even though the leaderboard link hasn't been updated.