People are also trading
This market should have resolved YES months ago. What's going on? I don't have any bets in it to be clear.
There's not that many different types of math undergrad class these days, and each one doesn't have that many different types of exam question. Models are easily able to memorize the O(1000) "tricks" needed to ace these exams.
@pietrokc The question is not "can it", it's will it. Someone must actually run the experiment and I have to see the results.
@vluzko That seems kind of disingenuous. You can easily run this experiment yourself using one of the good free models, like GPT Thinking. I'd honestly be surprised if you can find any undergraduate math exam from MIT or Harvard that GPT 5.4 Thinking or Gemini 3.1 Pro do not ace.
@pietrokc I disagree that asking this question commits me to grading a proofs exam myself. I can do that, but I think it's unreasonable to restrict the set of people who can ask questions like this to "people who can run the verification experiment themselves" rather than "people who can assess whether someone else ran it".
@pietrokc That's probably true. It is not really the intent of the question (I made this market before LLMs had really taken off and did not think of the training data issue), but I don't think it can be avoided. Some time in the vaguely near future I'll try to find a proofs exam and test this.