The study compares OpenAI's o1-mini and o3-mini models on math problems, finding that the o3-mini outperformed the o1-mini with fewer reasoning chains.
Response accuracy declined as the reasoning chains grew, indicating that 'thinking harder' isn't the same as 'thinking longer'.
Newer reasoning models use compute more effectively, resulting in a smaller accuracy drop for unsolvable problems.
OpenAI plans to unify the o-series and GPT-series models with the release of GPT-5, and the o3-mini model will not be released as a standalone model.