<ul><li>Research organization Epoch AI released FrontierMath, a new mathematics benchmark that challenges leading AI models.</li><li>FrontierMath contains expert-level problems that AI models solve less than 2 percent of the time.</li><li>Top AI models, including GPT-4o and Gemini 1.5 Pro, scored poorly on the FrontierMath benchmark.</li><li>FrontierMath differs from other benchmarks by keeping its problem set private to prevent data contamination.</li></ul>

New secret math benchmark stumps AI models and PhDs alike

Discover more