Fault tolerance in Deep Neural Networks (DNNs) deployed on resource-constrained systems presents unique challenges for high-accuracy applications with strict timing requirements.
The novel protection approach NAPER employs ensemble learning and heterogeneous model redundancy to achieve higher accuracy than traditional redundancy methods.
NAPER provides an efficient fault detection mechanism and a real-time scheduler to prioritize meeting deadlines and ensure uninterrupted operation during fault recovery.
Comparative evaluations show that NAPER offers 40% faster inference, 4.2% higher accuracy than TMR-based strategies, and effectively balances accuracy, reliability, and timeliness in real-time DNN applications.