Recent advancements in Chain of Thought (COT) generation have improved the reasoning capabilities of Large Language Models (LLMs).
SEED-Bench-R1 is a benchmark designed to evaluate post-training methods for Multimodal Large Language Models (MLLMs) in video understanding.
Reinforcement Learning (RL) shows data efficiency and superior performance on both in-distribution and out-of-distribution tasks compared to supervised fine-tuning (SFT).
However, RL often produces less logically coherent reasoning chains and has limitations such as inconsistent reasoning and overlooked visual cues.