menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Protocol M...
source image

Arxiv

3d

read

164

img
dot

Image Credit: Arxiv

Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism

  • Scaling models in deep learning for decentralized training poses challenges due to communication bottlenecks, especially with model parallelism.
  • A new compression algorithm is proposed that compresses both forward and backward passes, achieving up to 99% compression with negligible memory/compute overhead.
  • By confining activations and gradients in a predefined low-dimensional subspace through a recursive structure in transformer networks, the method enables full reconstruction in subsequent layers.
  • This approach improves communication efficiency by up to 100x, allowing training billion-parameter-scale models over low-end GPUs with consumer-grade internet speeds, matching the convergence of centralized datacenter systems.

Read Full Article

like

9 Likes

For uninterrupted reading, download the app