Recent advances in video generation models have highlighted the need for effective evaluation.Automated evaluation metrics lack high-level semantic understanding and reasoning capabilities for video.To address this, GRADEO introduces a video evaluation model that uses multi-step reasoning.Experiments show that GRADEO aligns better with human evaluations, revealing the limitations of current video generation models.