<ul><li>This paper discusses the use of large language models (LLMs) and vision-language models (VLMs) in video anomaly detection (VAD) in 2024.</li><li>The integration of LLMs and VLMs in VAD helps enhance interpretability, capture temporal relationships, enable few-shot and zero-shot detection, and address open-world and class-agnostic anomalies.</li><li>LLMs and VLMs offer semantic insights, textual explanations, and motion features for spatiotemporal coherence, making visual anomalies more understandable.</li><li>The paper explores the potential of LLMs and VLMs in redefining the landscape of VAD and proposes future directions for leveraging the synergy between visual and textual modalities.</li></ul>

Quo Vadis, Anomaly Detection? LLMs and VLMs in the Spotlight

Discover more