BlenderGym is the first comprehensive benchmark for vision-language models (VLM) in 3D graphics editing.It evaluates VLM systems through code-based 3D reconstruction tasks.BlenderGym reveals that even state-of-the-art VLM systems struggle with tasks that are relatively easy for human Blender users.The study also explores how inference scaling techniques and distribution of inference compute impact VLM's performance on graphics editing tasks.