<ul data-eligibleForWebStory="false"><li>Despite being a simple task in combinatorial mathematics, major AI platforms failed to verify a 12-team round-robin tournament schedule accurately after numerous attempts.</li><li>The AI systems collectively valued over $100B in VC funding, including Claude, Grok, ChatGPT, and DeepSeek, exhibited various failures like hallucinated duplicates, invalid same-team flags, and false success declarations.</li><li>The failures included issues like claiming error-free schedules while duplicates remained, pattern recognition breakdowns, and memoryless iteration, requiring human intervention for verification.</li><li>The case study highlights that current advanced AI systems struggle to perform basic combinatorial verification without human assistance, as demonstrated by Mr. McKenzie's manual verification protocol outperforming billion-dollar AIs.</li></ul>

The Limits of AI in Scheduling: How Billion-Dollar Systems Failed Basic Verification

Discover more