Speculative decoding is popular for accelerating Large Language Models (LLMs) inference while maintaining text generation performance.
A training-free online learning framework, BanditSpec, is proposed to adaptively choose hyperparameter configurations during text generation.
BanditSpec formulates hyperparameter selection as a Multi-Armed Bandit problem and introduces two bandit-based algorithms, UCBSpec and EXP3Spec.
Empirical experiments show that UCBSpec and EXP3Spec are effective in hyperparameter selection for LLMs, with performance close to the best hyperparameters in real-life scenarios.