Researchers propose a method for fully quantized Winograd convolution to reduce computational and storage costs in large-scale text-to-image diffusion models.
Quantization of diffusion models has been explored in previous works to reduce compute costs and memory bandwidth usage.
The proposed method focuses on finer-grained group-wise quantization, combined with finetuning the scale parameters of the Winograd transform matrices.
The method achieves near-lossless quality in text-to-image generation and outperforms state-of-the-art methods in image classification tasks.