This work proposes a mathematically founded mixed precision accumulation strategy for the inference of neural networks.
The strategy is based on a componentwise forward error analysis to explain the error propagation in the forward pass of neural networks.
The analysis shows that the error in each component of the output of a layer is proportional to the product of the condition numbers of the weights and the input, and the condition number of the activation function.
The proposed algorithm utilizes this analysis to determine the precision inversely proportional to these condition numbers, leading to improved cost-accuracy tradeoff compared to uniform precision accumulation baselines.