Masked Diffusion Models (MDMs) are powerful tools for generating discrete data by gradually unmasking tokens over time, but inefficiencies result in wasted computation due to unchanged sequences in many steps.
Recent enhancements in MDMs include refining training objectives, blending autoregressive methods, guiding sampling with energy-based models, and introducing Prime, a method that allows tokens to assume intermediate states by masking sub-parts of their encoded form.
MDM-Prime, an enhanced model utilizing the Prime method, achieves lower perplexity on text and competitive FID scores on image tasks, outperforming previous MDMs and autoregressive models.
MDM-Prime's architecture involves sub-token level partial masking, enabling smoother intermediate state generation, improved model efficiency, and stronger performance in text and image generation tasks.