Following the rapid increase in Artificial Intelligence (AI) capabilities in recent years, the AI community has voiced concerns regarding possible safety risks.
To support decision-making on the safe use and development of AI systems, there is a growing need for high-quality evaluations of dangerous model capabilities.
In this practitioners' perspective paper, a set of best practices for safety evaluations is presented, drawing on prior work in model evaluation and illustrated through cybersecurity examples.
The paper discusses the steps of the initial thought process, characteristics of a useful evaluation, and additional considerations for building a comprehensive evaluation suite.