menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

DiSCo: Dev...
source image

Arxiv

3d

read

104

img
dot

Image Credit: Arxiv

DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services

  • The rapid rise of large language models (LLMs) in text streaming services has posed cost and Quality of Experience (QoE) challenges in meeting real-time interaction requirements.
  • DiSCo is introduced as a device-server cooperative scheduler to enhance users' QoE by dynamically routing requests and transferring response generation between endpoints while considering cost limitations.
  • The scheduler uses cost-aware scheduling to leverage both on-device and server-based LLM inference, reducing tail Time-To-First-Token (TTFT) by 11-52% and mean TTFT by 6-78% across various model-device configurations.
  • DiSCo significantly reduces serving costs by up to 84% through its migration mechanism while maintaining comparable QoE levels, as validated by evaluations on real-world workloads.

Read Full Article

like

6 Likes

For uninterrupted reading, download the app