<ul><li>Following the rapid increase in Artificial Intelligence (AI) capabilities in recent years, the AI community has voiced concerns regarding possible safety risks.</li><li>To support decision-making on the safe use and development of AI systems, there is a growing need for high-quality evaluations of dangerous model capabilities.</li><li>In this practitioners' perspective paper, a set of best practices for safety evaluations is presented, drawing on prior work in model evaluation and illustrated through cybersecurity examples.</li><li>The paper discusses the steps of the initial thought process, characteristics of a useful evaluation, and additional considerations for building a comprehensive evaluation suite.</li></ul>

What Makes an Evaluation Useful? Common Pitfalls and Best Practices

Discover more