MANIFOLD
Top OSWorld score in 2025?
21
Ṁ10kṀ74k
resolved Jan 2
Resolved
~65.0 %
Resolved
YES
Above 50%
Resolved
YES
Above 60%
Resolved
NO
Above 90%
Resolved
NO
Above 80%
Resolved
NO
Above 70%

Background

OSWorld is a benchmark for evaluating multimodal AI agents on real-world computer tasks in open-ended environments. It tests an AI's ability to navigate operating systems, use applications, and complete practical tasks through a combination of vision and text inputs/outputs.

As of January 24, 2025, the highest OSWorld score is held by OpenAI CUA (200 steps) with a score of 38.1. Other notable scores include:

  • UI-TARS-72B-DPO (50 steps): 24.6

  • UI-TARS-72B-DPO (15 steps): 22.7

  • Claude 3.5 Sonnet (50 steps): 22.0

Resolution Criteria

This market will resolve to the highest verified OSWorld score achieved by any AI model during the 2025 calendar year (January 1, 2025 to December 31, 2025). The score must be publicly reported and verifiable through official sources such as the OSWorld leaderboard, academic publications, or credible tech news outlets.

If multiple models achieve the same highest score, the market will resolve to that score. If scores are reported with different decimal precisions, they will be considered at their reported precision.

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#TraderTotal profit
1Ṁ2,626
2Ṁ1,430
3Ṁ486
4Ṁ406
5Ṁ231
Sort by:

@SG

resolve no for all

@prismatic huh? It's 72.6% if you click "all". Resolution doesn't say limited to foundation e2e

@Usaar33 it says "any AI model" and the 72.6% score is from an AI system, not AI model by the conventional meaning of those terms

Will OSWorld Verified count toward this market? Seems that it's what labs are using now:

https://x.com/justjoshinyou13/status/1972721166277111948

@SG 50% and 60% resolve YES if you do count OSWorld Verified (and i def think you should)

© Manifold Markets, Inc.TermsPrivacy