menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

New secret...
source image

Arstechnica

1M

read

137

img
dot

Image Credit: Arstechnica

New secret math benchmark stumps AI models and PhDs alike

  • Research organization Epoch AI released FrontierMath, a new mathematics benchmark that challenges leading AI models.
  • FrontierMath contains expert-level problems that AI models solve less than 2 percent of the time.
  • Top AI models, including GPT-4o and Gemini 1.5 Pro, scored poorly on the FrontierMath benchmark.
  • FrontierMath differs from other benchmarks by keeping its problem set private to prevent data contamination.

Read Full Article

like

8 Likes

For uninterrupted reading, download the app