<ul><li>Model quantization reduces the bit-width of weights and activations, improving memory efficiency and inference speed in diffusion models.</li><li>Challenges in achieving 4-bit quantization have been identified, including handling asymmetric activation distributions, temporal complexity in the denoising process during fine-tuning, and misalignment between fine-tuning loss and quantization error.</li><li>To overcome these challenges, a mixup-sign floating-point quantization (MSFP) framework is proposed, introducing unsigned FP quantization, timestep-aware LoRA (TALoRA), and denoising-factor loss alignment (DFA) for precise and stable fine-tuning.</li><li>Through extensive experiments, superior performance in 4-bit FP quantization for diffusion models has been achieved, surpassing existing post-training quantization fine-tuning methods in 4-bit integer quantization.</li></ul>

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

Discover more