Explicit density learners are gaining popularity as generative models for their ability to model probability distributions, offering advantages over Generative Adversarial Networks.
Normalizing flows use bijective functions to make complex probability functions manageable, but can be challenging to train and may have lower sampling quality.
Novel knowledge distillation techniques are introduced to improve sampling quality and density estimation in smaller student normalizing flows.
The study explores knowledge distillation in Compositional Normalizing Flows, showing significant performance gains and increased throughput with smaller models.