menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

MegaScale-...
source image

Arxiv

4d

read

350

img
dot

Image Credit: Arxiv

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

  • MegaScale-Infer is an efficient system for serving large-scale Mixture-of-Experts (MoE) models.
  • It disaggregates attention and feed-forward network (FFN) modules within each model layer.
  • MegaScale-Infer introduces ping-pong pipeline parallelism to exploit MoE's sparsity.
  • Experimental results show that MegaScale-Infer achieves higher per-GPU throughput than other solutions.

Read Full Article

like

21 Likes

For uninterrupted reading, download the app