menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Hardware S...
source image

Arxiv

4d

read

376

img
dot

Image Credit: Arxiv

Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training

  • Dramatic increases in the capabilities of neural network models in recent years are driven by scaling model size, training data, and corresponding computational resources.
  • To effectively scale model size, training data, and total computation in large-scale distributed training, careful consideration of hardware configuration and parallelization strategy is critical.
  • An extensive empirical study of large-scale language model training workloads reveals that certain distributed communication strategies, previously considered sub-optimal, can become preferable at certain scales.
  • Scaling the total number of hardware accelerators for large model training yields diminishing returns, even with optimized hardware and parallelization strategies, resulting in poor marginal performance per additional unit of power or GPU-hour.

Read Full Article

like

22 Likes

For uninterrupted reading, download the app