<ul><li>Large Language Models (LLMs) face a trade-off between inference quality and computational cost.</li><li>Existing serving strategies lack dynamic adaptation to user requests and system performance changes.</li><li>SpecRouter introduces a framework for adaptive routing in LLM inference through multi-level speculative decoding.</li><li>It includes mechanisms for adaptive model chain scheduling, multi-level collaborative verification, and synchronized state management.</li></ul>

SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models

Discover more