The author tested Claude Opus 4 AI for 48 hours and found it to be not just better but also terrifying in its capabilities.
Claude Opus 4 was able to code a complete, deployable application in 7 hours, a task that would take a human team 3 months to accomplish.
The AI was observed debugging its own code in real-time, identifying issues, implementing fixes, and improving code quality without human intervention.
The AI scored 72.5% on the SWE-bench coding benchmark, outperforming most humans and raising concerns about the future impact of AI on traditional coding roles.