<ul><li>Apple AI researchers have found limitations in large reasoning models' ability to handle complex problems, challenging assumptions about artificial general intelligence (AGI) capabilities.</li><li>Authors tested reasoning models like OpenAI’s o1/o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking, and Gemini Thinking in a puzzle environment to evaluate performance.</li><li>Reasoning models performed better with moderately complex problems but struggled with higher complexity, indicating a threshold beyond which their performance collapsed.</li><li>The study suggests that current approaches using large reasoning models may face significant obstacles in achieving generalizable reasoning capabilities, raising questions about the advancement towards AGI.</li></ul>

Apple AI boffins puncture AGI hype as reasoning models flail on complex planning

Discover more