VELOCITI is a benchmark created to study Video-LLMs and assess compositional reasoning in short videos.It disentangles and evaluates the comprehension of agents, actions, and their associations across multiple events.Current video models like LLaVA-OneVision and Gemini-1.5-Pro perform far from human accuracy in classifying positive and negative captions.The benchmark highlights challenges with ClassicVLE and multiple-choice evaluation, emphasizing the preference for StrictVLE.