The article discusses a new framework called Denoising Multi-Beta VAE that aims to balance between disentanglement and generation quality in generative models.
Traditionally, achieving interpretable latent representations in generative models comes at the expense of generation quality. The $eta$-VAE method introduces a hyperparameter $eta$ to manage the trade-off between disentanglement and reconstruction quality.
The Denoising Multi-Beta VAE framework aims to address the disentanglement-reconstruction quality trade-off by utilizing a range of $eta$ values to learn multiple corresponding latent representations. It leverages a non-linear diffusion model to transition between latent representations smoothly.
The proposed framework is evaluated for its disentanglement and generation quality, showing promising results in achieving both sharp reconstructions and consistent manipulation of generated outputs with respect to changes in $eta.