menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Cybench: A...
source image

Arxiv

4d

read

267

img
dot

Image Credit: Arxiv

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

  • Cybench is a framework introduced for evaluating cybersecurity capabilities and risks of language model agents for autonomous vulnerability identification and exploit execution.
  • It includes 40 professional-level Capture the Flag (CTF) tasks from 4 distinct competitions, providing a wide range of difficulties.
  • By evaluating various language models, including GPT-4o and Claude 3.5 Sonnet, it was found that models could successfully solve tasks that took human teams hours to complete.
  • The framework and all related code and data are publicly available at https://cybench.github.io.

Read Full Article

like

16 Likes

For uninterrupted reading, download the app