Diffusion models have been found to enhance empirical adversarial robustness, although the exact mechanisms behind these improvements are not fully understood.
Diffusion models surprisingly increase the distance to clean samples rather than reducing it, challenging the conventional belief that purification should denoise inputs closer to the original data.
Purified images in diffusion models are influenced significantly by internal randomness, leading to a compression effect within each randomness configuration.
The study suggests that the remaining robustness gain in diffusion models is linked to the model's ability to compress the input space, highlighting compression rate as a reliable indicator of robustness without the need for gradient-based analysis.