Researchers have introduced Ambient Diffusion Omni, a framework to train diffusion models using low-quality, synthetic, and out-of-distribution images.
Traditional diffusion models are trained on curated datasets but the new approach aims to leverage lower-quality images usually discarded.
The framework leverages spectral power law decay and locality properties of natural images.
They successfully trained diffusion models with synthetically corrupted images and achieved state-of-the-art ImageNet FID.
Significant improvements in image quality and diversity were observed for text-to-image generative modeling.
The core insight of the framework is that noise helps in learning from biased data and mixed distributions.
The approach was validated through the trade-off analysis between biased but abundant data and limited unbiased data during diffusion times.