<ul><li>Compute nodes on modern heterogeneous supercomputing systems consist of CPUs, GPUs, and high-speed network interconnects.</li><li>Parallelization is a technique used for scalable simulation and deep learning workloads on these systems.</li><li>Communication bottlenecks from distributed execution of parallel workloads can impact performance.</li><li>This survey explores GPU-centric communication schemes that move control path from CPU to GPU.</li></ul>

GPU-centric Communication Schemes for HPC and ML Applications

Discover more