menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

SageAttent...
source image

Arxiv

2d

read

81

img
dot

Image Credit: Arxiv

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

  • The efficiency of attention is crucial due to its quadratic time complexity.
  • Enhancement of attention efficiency is achieved through leveraging the new FP4 Tensor Cores in Blackwell GPUs, resulting in 5x speedup over the fastest FlashAttention on RTX5090.
  • Introduction of low-bit attention to training tasks, exploring its effectiveness in both forward and backward propagation.
  • Experiments show that 8-bit attention achieves lossless performance in fine-tuning tasks but has slower convergence in pretraining tasks.

Read Full Article

like

4 Likes

For uninterrupted reading, download the app