menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Scaling Up...
source image

Arxiv

5d

read

122

img
dot

Image Credit: Arxiv

Scaling Up On-Device LLMs via Active-Weight Swapping Between DRAM and Flash

  • Large language models (LLMs) are being deployed on mobile devices, but limited DRAM capacity constrains the model size.
  • ActiveFlow is introduced as an LLM inference framework that enables adaptive DRAM usage for modern LLMs.
  • ActiveFlow utilizes novel techniques such as cross-layer active weights preloading and sparsity-aware self-distillation.
  • The framework achieves the performance-cost Pareto frontier compared to existing optimization methods.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app