DeepSeek AI has released DeepGEMM, an FP8 GEMM library for efficient matrix multiplications in deep learning and high-performance computing.DeepGEMM supports both standard and Mix-of-Experts (MoE) grouped GEMMs, leveraging NVIDIA Hopper tensor cores.The library utilizes fine-grained scaling and a two-level accumulation strategy for accurate FP8 arithmetic without compromising performance.DeepGEMM offers clear efficiency improvements with speedups of up to 2.7x for normal GEMMs and 1.1x to 1.2x for grouped GEMMs.