menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Recipes fo...
source image

Arxiv

4d

read

175

img
dot

Image Credit: Arxiv

Recipes for Pre-training LLMs with MXFP8

  • Precision scaling with fewer bits is being used in pre-training LLMs to improve GPU efficiency without sacrificing accuracy.
  • NVIDIA's latest Blackwell GPUs employ Microscaling (MX) formats, combining narrow floating-point data types with per-block scaling factors for quantizing tensors.
  • While MX-formats offer improved numeric stability, careful usage is required to ensure successful convergence of LLMs on large datasets.
  • The study proposes an improved rounding mode using round-to-infinity to compute scaling factors, allowing successful pre-training in MXFP8 for an 8B model on 15T tokens.

Read Full Article

like

10 Likes

For uninterrupted reading, download the app