A new research study reveals that AI models from Google, OpenAI, and Anthropic have not been able to solve any 'Hard' coding problems, scoring 0%.
The study by multiple universities identified shortcomings in existing coding benchmarks and introduced the LiveCodeBench Pro to evaluate models with challenging problems.
Models excelled in knowledge-heavy and logic-heavy problems but struggled with observation-heavy challenges that require novel insights.
The AI models often made errors related to algorithms, showing room for improvement even with multiple attempts.
Despite claims of surpassing elite humans, the models still lag significantly in tasks demanding unique solutions.
Another analysis by Oxford researcher Toby Ord suggests that AI agents have a declining success rate over longer tasks, posing a challenge for handling complex coding projects.
While AI agents show improvements in handling longer tasks, achieving high-reliability performance still requires significantly shorter task durations.
The timeline for AI to effectively manage intricate coding tasks remains uncertain despite advancements in AI capabilities.
A detailed technical report is available for in-depth information on the research findings.
The article delves into the struggles of AI models in tackling complex coding challenges, emphasizing the need for improvement in reasoning and problem-solving capabilities.