<ul><li>BlenderGym is the first comprehensive benchmark for vision-language models (VLM) in 3D graphics editing.</li><li>It evaluates VLM systems through code-based 3D reconstruction tasks.</li><li>BlenderGym reveals that even state-of-the-art VLM systems struggle with tasks that are relatively easy for human Blender users.</li><li>The study also explores how inference scaling techniques and distribution of inference compute impact VLM's performance on graphics editing tasks.</li></ul>

BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing

Discover more