Researchers propose an error-resilient framework called end-to-end fault tolerant attention (EFTA) for Transformer models.
EFTA incorporates error detection and correction within a fully fused attention kernel, reducing redundant data access and mitigating memory faults.
The framework introduces architecture-aware algorithm-based fault tolerance (ABFT) using tensor checksum to minimize communication overhead during error detection.
Experimental results show that EFTA achieves up to 7.56x speedup over traditional methods with an average fault tolerance overhead of 13.9%.