Will "large reasoning models" significantly outperform traditional LLMs in creative writing by 2026?

Ṁ1kṀ1.9k

resolved Jan 3

Resolved

ALL

While o1-preview outperformed previous models in math and programming, it wasn't preferred over GPT-4o in the "personal writing" domain:

However, it seems like models that think before they respond should be able to improve their writing significantly, in principle. By analogy, if I had to write a poem off the top of my head, it would be much worse than a poem that I took an hour to write. It seems that o1 and o3 are primarily trained to solve problems with a definitive correct answer, but I don't think this will be the case forever.

For the purposes of this question, "large reasoning models" (LRMs) are LLMs that are trained to generate intermediate outputs that are not part of the final answer. This could include chain-of-thought training or "continuous thought" techniques like Meta's COCONUT.

Creative writing could include poetry, short stories, novel outlines, comedy routines, and similar things.

Things I might look for to determine a resolution:

The "Creative Writing" category in https://lmarena.ai/?leaderboard. If it's clear that LRMs outperform other models, the market will likely resolve YES. However, if it's simply the case that all "serious" models are LRMs and the "base LLM" version of those models isn't even available, then this wouldn't count unless I think the reasoning is really the cause of the performance gap.
If an LRM has the option to modulate the length of its reasoning, I'd look for metrics showing that increased test-time compute corresponds to a marked increase in response quality in the creative writing domain.
In the absence of good metrics, I'd rely on general vibes based on the experience of myself and others.

Ultimately, I'm thinking about a scenario where LRMs are clearly better for creative writing than traditional LLMs, in the same way that o1 is clearly better than GPT-4o at (well-specified) coding problems. If that bar hasn't been reached, this market should resolve NO.

I won't trade in this market.

Update 2026-01-02 (PST) (AI summary of creator comment): The creator is leaning towards a NO resolution based on current evidence:
- All serious modern LLMs are now reasoning models, making it difficult to compare reasoning vs non-reasoning models
- The fact that reasoning models like Gemini 3 Pro top the LMArena creative writing leaderboard doesn't demonstrate that reasoning itself improves creative writing
- Evidence suggests models like GPT-5 and r1 are not actually better at creative writing due to their reasoning capabilities

The creator is opening the floor to objections from traders before making a final resolution decision.

Market context

Technical AI Timelines

LLMs

Writing

Literature

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ459
2		Ṁ151
3		Ṁ139
4		Ṁ47
5		Ṁ47

People are also trading

Will an LLM have been used as part of the composition a work which wins the Nobel Prize in Literature by 2040?

51% chance

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

14% chance

Will any major science fiction short story markets lift their categorical ban on writing with LLMs by 2030?

43% chance

Will I write an academic paper using an LLM by 2030?

65% chance

Will LLMs be used for academic peer review by 2030?

71% chance

Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?

55% chance

At the beginning of 2028, will LLMs still make egregious common-sensical errors?

67% chance

By 2029 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

77% chance

By 2027, will it be generally agreed upon that LLM produced text > human text for training LLMs?

62% chance

Will the most interesting AI in 2027 be a LLM?

Sort by:

I think it's pretty unclear that reasoning models are better than non-reasoning models at creative writing. All of the serious modern LLMs are in fact reasoning models, so the fact that Gemini 3 Pro is at the top of LMArena for creative writing doesn't mean much.

nostalgebraist's posts on this, e.g. the improvised bureaucracy of wonder, are pretty convincing to me that reasoning models like GPT-5 and r1 are not actually better at creative writing due to their reasoning, despite the comment I made on this market earlier.

Since there isn't clear evidence, I'm leaning towards a NO resolution, but I'm opening the floor to objections @traders

https://x.com/chatgpt21/status/1933050795240415449

But

https://x.com/JuliusSimonelli/status/1932944540517945732

sold Ṁ29 NO

anecdotally, DeepSeek r1 is in my opinion a vastly better writer than v3. I'm reversing my prediction based on this

@MingCat Yeah, I also think this is probably true. DeepSeek r1 is really good at certain writing tasks in my experience. This also checks out when you look at the LMSYS leaderboard for creative writing:

I'm not in a rush to resolve this market, but I think it's quite likely to resolve to YES. I'll wait until there's a greater universal consensus about this.