menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

FT-Transfo...
source image

Arxiv

4d

read

121

img
dot

Image Credit: Arxiv

FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention

  • Researchers propose an error-resilient framework called end-to-end fault tolerant attention (EFTA) for Transformer models.
  • EFTA incorporates error detection and correction within a fully fused attention kernel, reducing redundant data access and mitigating memory faults.
  • The framework introduces architecture-aware algorithm-based fault tolerance (ABFT) using tensor checksum to minimize communication overhead during error detection.
  • Experimental results show that EFTA achieves up to 7.56x speedup over traditional methods with an average fault tolerance overhead of 13.9%.

Read Full Article

like

7 Likes

For uninterrupted reading, download the app