OpenAI's o3 models have achieved the ARC-AGI benchmark surpassing human-level performance and achieving a score of 87.5%.
Francois Chollet has challenged OpenAI's claim of achieving AGI with o3 models.
The new frontier models o3 and o3 Mini will be accessible to researchers for public safety testing.
OpenAI co-founder Ilya Sutskever's claims that the era of pretraining has officially ended and RL architecture will be actively used to scale reasoning capabilities.
OpenAI scored 75.7% on the ARC-AGI semi-private set and 87.5% on high-compute settings, surpassing the 85% human-level performance threshold.
OpenAI's o3 also achieved impressive results in other benchmarks, including software engineering and mathematical tests.
OpenAI also introduced the concept of deliberative alignment, a new safety technique that uses o3's advanced reasoning capabilities to identify and reject unsafe prompts more effectively.
The need for trust and safety has led to incubators funding startups that solve for a post-AGI world and systemic changes such as Universal Basic Income (UBI) and Universal Basic Compute (UBC).
The foundation for this new reality will be where GDP will grow because of AI, and not extra work hours.
Future technological advancements like Universal Basic Robot (UBR) is also beginning to become a huge theme for 2025.