Modern reinforcement learning (RL) algorithms have been successful using powerful probabilistic models like transformers, energy-based models, and diffusion/flow-based models.
Normalizing flows (NFs) are seen as an alternative to these models, enabling likelihoods and sampling without the computational intensity of solving differential equations or relying on autoregressive architectures.
A proposed single NF architecture integrates seamlessly into RL algorithms, serving as a policy, Q-function, and occupancy measure, simplifying algorithms and improving performance in imitation learning, offline, goal conditioned RL, and unsupervised RL.