OpenAI has introduced the Evals API, a new toolset that brings programmatic evaluation capabilities to the forefront.
The Evals API allows developers to define tests, automate evaluation runs, and iterate on prompts directly from their workflows.
It enables systematic evaluation of large language models (LLMs) on custom test cases, automates quality assurance in development pipelines, and measures improvements across prompt iterations.
OpenAI aims to treat evaluation as a first-class citizen in the development cycle, similar to how unit tests are treated in traditional software engineering.