Post methodology: Claude 4.0 via custom Dust assistant @TDep-SubstackPost with the system prompt: Please read the text of the podcast transcript in the prompt and write a short post that summarizes the main points and incorporates any recent news articles that provide helpful context for the interview. Please make the post as concise as possible and avoid academic language or footnotes. put any linked articles or tweets inline in the text. Refer to podcast guests by their first names after the initial mention. Light editing and reformatting for the Substack editor.
OpenAI just achieved something the AI community thought was still years away: winning gold at the International Mathematical Olympiad (IMO). What makes this breakthrough even more remarkable is that it came from a scrappy three-person team—Alex, Cheryl, and Noam—who pulled off the final push in just two months.
The achievement represents a big leap in AI reasoning capabilities that builds on the work of many other teams at OpenAI. While models were struggling with grade school math problems just a few years ago, AI has rapidly blown through mathematical benchmarks—from GSM-8K to AIME to USAMO and now IMO gold.
The Technical Breakthrough
What sets this system apart isn't just raw computational power—Google DeepMind and Harmonic also got gold—it's the approach to "hard to verify" tasks. The model can think for extended periods and use parallel computing through multi-agent systems. But perhaps most impressively, when faced with IMO's notoriously difficult problem six, the system didn't hallucinate a fake solution. Instead, it acknowledged it couldn't solve it—a level of self-awareness that previous models lacked.
"It was good to see the model doesn't try to hallucinate or make up some solution, but instead will say 'no answer,'" Cheryl explained during the podcast. This honesty addresses a major complaint from mathematicians who previously had to carefully fact-check every AI-generated proof for subtle errors.
From Competition to Real Research
The team deliberately chose a general-purpose approach over formal verification tools like Lean, prioritizing techniques that could transfer beyond competition math. As Noam noted, their methods are already being incorporated into other OpenAI systems, potentially improving everything from ChatGPT to their upcoming agent capabilities.
The gap between competition problems (which take students 1.5 hours each) and research breakthroughs (requiring thousands of hours) remains vast. But as Noam notes, the pace of progress in AI reasoning has been accelerating dramatically.
What's Next
While Millennium Prize problems remain far off, the team is already looking beyond time-boxed competitions toward problems requiring deeper, longer-term thinking. They're also working to make the system accessible to mathematicians to augment human mathematical discovery through collaborative research.
The achievement serves as another reminder that in AI, the impossible often becomes inevitable faster than anyone expects. What seemed like a distant goal just 15 months ago (when the team estimated only 12% odds of success) became reality through focused engineering and the relentless pace of AI progress.
For a field that measures breakthroughs in months rather than years, OpenAI's IMO gold isn't just a milestone—it's a preview of the reasoning capabilities heading our way.
Hosted by Sonya Huang
Mentioned in this episode:
IMO: Official page of International Mathematical Olympiad (problems here)
OpenAI IMO 2025 Proofs: Solutions to the five problems OpenAI completed
Alexander’s X thread: Announcement of results. (also see Noam’s LinkedIn post)
Share this post