<ul><li>Speculative decoding technique is used to improve the efficiency of large-scale autoregressive Transformer models by enabling multiple steps of token generation in a single forward pass.</li><li>Speculative decoding has been extended to state-space models (SSMs) to make them more efficient by leveraging hardware concurrency.</li><li>A scalable algorithm has been proposed for tree-based speculative decoding in SSMs and hybrid architectures of SSMs and Transformer layers, utilizing accumulated state transition matrices.</li><li>The proposed hardware-aware implementation outperforms vanilla speculative decoding methods with SSMs on three different benchmarks, paving the way for enhanced speed and efficiency in SSM and hybrid model inference.</li></ul>

STree: Speculative Tree Decoding for Hybrid State-Space Models

Discover more