<ul><li>SpikeVideoFormer is introduced as an efficient spike-driven video Transformer with linear temporal complexity O(T).</li><li>The model features a spike-driven Hamming attention (SDHA) that transitions from traditional real-valued attention to spike-driven attention.</li><li>Multiple spike-driven space-time attention designs were analyzed to identify an optimal scheme for video tasks with linear temporal complexity.</li><li>The SpikeVideoFormer model demonstrates superior performance in diverse video tasks like classification, human pose tracking, and semantic segmentation, outperforming existing SNN approaches and offering significant efficiency gains.</li></ul>

SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity

Discover more