This paper proposes a general framework for improving the training of low-precision networks through block replacement on full-precision counterparts.
The framework allows the low-precision network to be guided by the full-precision partner during training, addressing the limitations of direct training of low-precision networks.
By generating intermediate mixed-precision models through block-by-block replacement, the integration of quantized low-precision blocks into full-precision networks is enabled.
Experimental results demonstrate that the proposed method achieves state-of-the-art results for 4-, 3-, and 2-bit quantization on ImageNet and CIFAR-10.