LLMs Hit a New Low on ARC-AGI-2 Benchmark, Pure LLMs Score 0%

A naukri.com initiative

New

LLMs Hit a...

Analyticsindiamag

Image Credit: Analyticsindiamag

ARC Prize has announced the ARC-AGI-2 benchmark to evaluate AI models' human-like intelligence.
The benchmark poses greater challenges by factoring in efficiency and performance.
Non-reasoning models (Pure LLMs) scored 0%, while human participants achieved a perfect score of 100%.
OpenAI's o3 reasoning model received the highest score of 4.0%, but will not be released as a standalone model.

Read Full Article

2 Likes

For uninterrupted reading, download the app