Discrete diffusion models have shown promise for modeling complex discrete data, with masked diffusion models offering a balance between quality and generation speed.
Variational Autoencoding Discrete Diffusion (VADD) is proposed as a framework that enhances discrete diffusion by capturing correlations among dimensions using latent variable modeling.
VADD includes an auxiliary recognition model for stable training through variational lower bounds maximization and amortized inference over the training set.
Empirical results show that VADD outperforms masked diffusion model baselines in terms of sample quality, especially with fewer denoising steps.