menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

SCORPIO: S...
source image

Arxiv

1w

read

176

img
dot

Image Credit: Arxiv

SCORPIO: Serving the Right Requests at the Right Time for Heterogeneous SLOs in LLM Inference

  • Researchers propose a new system called SCORPIO for serving Large Language Models (LLM) that prioritizes Service Level Objectives (SLOs) like Time to First Token (TTFT) and Time Per Output Token (TPOT) to maximize system goodput and SLO attainment.
  • SCORPIO leverages SLO heterogeneity for adaptive scheduling across admission control, queue management, and batch selection, featuring a TTFT Guard and a TPOT Guard supported by a predictive module.
  • Evaluations show that SCORPIO can improve system goodput by up to 14.4 times and SLO adherence by up to 46.5% when compared to existing baselines in LLM serving systems.

Read Full Article

like

10 Likes

For uninterrupted reading, download the app