<ul><li>Process-supervised reward models (PRMs) offer fine-grained, step-wise feedback on model responses, aiding in selecting effective reasoning paths for complex tasks.</li><li>Existing reward benchmarks primarily focus on text-based models, with some specifically designed for PRMs. In the vision-language domain, evaluation methods generally assess broad model capabilities.</li><li>Researchers from UC Santa Cruz, UT Dallas, and Amazon Research benchmarked VLLMs as ORMs and PRMs across multiple tasks, revealing that neither consistently outperforms the other.</li><li>VLLMs are increasingly effective across various tasks, particularly when evaluated for test-time scaling. ORMs generally outperform PRMs, while a hybrid approach between ORM and PRM is optimal.</li></ul>

Advancing Vision-Language Reward Models: Challenges, Benchmarks, and the Role of Process-Supervised Learning

Discover more