<ul><li>DeepSeek AI has released DeepGEMM, an FP8 GEMM library for efficient matrix multiplications in deep learning and high-performance computing.</li><li>DeepGEMM supports both standard and Mix-of-Experts (MoE) grouped GEMMs, leveraging NVIDIA Hopper tensor cores.</li><li>The library utilizes fine-grained scaling and a two-level accumulation strategy for accurate FP8 arithmetic without compromising performance.</li><li>DeepGEMM offers clear efficiency improvements with speedups of up to 2.7x for normal GEMMs and 1.1x to 1.2x for grouped GEMMs.</li></ul>

DeepSeek AI Releases DeepGEMM: An FP8 GEMM Library that Supports both Dense and MoE GEMMs Powering V3/R1 Training and Inference

Discover more