menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Data Science News

>

OpenAI’s N...
source image

Analyticsindiamag

2w

read

401

img
dot

Image Credit: Analyticsindiamag

OpenAI’s New Benchmark to Study AI Agents’ Research Capabilities

  • OpenAI unveiled PaperBench, a new benchmark to measure how well AI agents can reproduce cutting-edge AI research.
  • The benchmark consists of 20 top papers from the International Conference on Machine Learning (ICML) 2024, covering 12 different topics.
  • Anthropic's Claude 3.5 Sonnet was the best performing model with a 21.0% replication score, while human PhDs scored an average of 41.4%.
  • PaperBench's code is available on GitHub, and a lightweight version of the benchmark, PaperBench Code-Dev, is also accessible.

Read Full Article

like

24 Likes

For uninterrupted reading, download the app