<ul><li>Researchers propose an error-resilient framework called end-to-end fault tolerant attention (EFTA) for Transformer models.</li><li>EFTA incorporates error detection and correction within a fully fused attention kernel, reducing redundant data access and mitigating memory faults.</li><li>The framework introduces architecture-aware algorithm-based fault tolerance (ABFT) using tensor checksum to minimize communication overhead during error detection.</li><li>Experimental results show that EFTA achieves up to 7.56x speedup over traditional methods with an average fault tolerance overhead of 13.9%.</li></ul>

FT-Transformer: Resilient and Reliable Transformer with End-to-End Fault Tolerant Attention

Discover more