<ul><li>Recent advances in vision-language models (VLMs) offer the potential to automate design assessments, but it is crucial to ensure that these AI ``judges'' perform on par with human experts.</li><li>A statistical framework has been introduced to determine whether an AI judge's ratings match those of human experts in design evaluation.</li><li>The top-performing AI judge using text- and image-based in-context learning achieves expert-level agreement for uniqueness and drawing quality and outperforms or matches trained novices across all metrics.</li><li>Reasoning-supported VLM models can achieve human-expert equivalence in design evaluation, impacting design evaluation in education and practice.</li></ul>

AI Judges in Design: Statistical Perspectives on Achieving Human Expert Equivalence With Vision-Language Models

Discover more