A research paper by Apple has debunked the idea that large language models are reliable for reasoning tasks.The paper revealed that leading models like ChatGPT struggle with complexity and collapse when faced with new challenges.Neural networks can generalize within their data distribution but fail when they encounter novel scenarios.Scaling these models to be bigger does not solve the reasoning limitations, as shown in the Apple paper.Even classic puzzles like the Tower of Hanoi pose challenges for leading generative models.The paper highlighted that LLMs lack logical and intelligent problem-solving processes.AGI should combine human adaptiveness with computational reliability, not just replicate human limits.LLMs cannot be a reliable substitute for well-specified conventional algorithms in solving complex problems.Despite their uses in coding and writing, LLMs are not a direct path to transformative AGI.The new paper underscores the limitations of generative AI and the need for caution in trusting its outputs.