MegaScale-Infer is an efficient system for serving large-scale Mixture-of-Experts (MoE) models.It disaggregates attention and feed-forward network (FFN) modules within each model layer.MegaScale-Infer introduces ping-pong pipeline parallelism to exploit MoE's sparsity.Experimental results show that MegaScale-Infer achieves higher per-GPU throughput than other solutions.