On the task described in https://arxiv.org/abs/2509.09677, what will be the length of tasks that Gemini 3 will be able to complete in one go?
I'm an author, and I will run the same setup above^ to resolve this.
Currently:
GPT-5 Thinking is 1024
Claude 4 Sonnet is 432
Grok-4 is 384
Update 2026-01-10 (PST) (AI summary of creator comment): The creator is attempting to resolve this market to N/A due to ambiguity in the original question - it did not specify whether Gemini 3 Flash or Gemini 3 Pro would be tested.
People are also trading
@JaySocrates I am trying to resolve this to N/A
Because we did not specify Gemini 3 Flash vs Pro, which was a dumb mistake at the time. But i get this error:
error: ON CONFLICT DO UPDATE command cannot affect row a second time at Parser.parseErrorMessage (/usr/src/app/node_modules/pg-protocol/src/parser.ts:368:69) at ...(node:internal/async_hooks:130:17)