<ul data-eligibleForWebStory="false"><li>Speculative decoding aims to reduce inference latency of large language models by using a faster draft model.</li><li>Previous methods for choosing the candidate length parameter in speculative decoding may not be optimal.</li><li>Researchers propose SpecDec++, an improved speculative decoding approach that dynamically determines candidate lengths based on acceptance probabilities.</li><li>SpecDec++ shows significant speedup and performance improvements on various datasets compared to traditional speculative decoding methods.</li></ul>

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Discover more