Compute nodes on modern heterogeneous supercomputing systems consist of CPUs, GPUs, and high-speed network interconnects.Parallelization is a technique used for scalable simulation and deep learning workloads on these systems.Communication bottlenecks from distributed execution of parallel workloads can impact performance.This survey explores GPU-centric communication schemes that move control path from CPU to GPU.