Researchers propose a new system called SCORPIO for serving Large Language Models (LLM) that prioritizes Service Level Objectives (SLOs) like Time to First Token (TTFT) and Time Per Output Token (TPOT) to maximize system goodput and SLO attainment.
SCORPIO leverages SLO heterogeneity for adaptive scheduling across admission control, queue management, and batch selection, featuring a TTFT Guard and a TPOT Guard supported by a predictive module.
Evaluations show that SCORPIO can improve system goodput by up to 14.4 times and SLO adherence by up to 46.5% when compared to existing baselines in LLM serving systems.