Speculative decoding aims to reduce inference latency of large language models by using a faster draft model.
Previous methods for choosing the candidate length parameter in speculative decoding may not be optimal.
Researchers propose SpecDec++, an improved speculative decoding approach that dynamically determines candidate lengths based on acceptance probabilities.
SpecDec++ shows significant speedup and performance improvements on various datasets compared to traditional speculative decoding methods.