Generative models like diffusion and flow-matching provide expressive policies for offline reinforcement learning.
A new approach called Single-Step Completion Policy (SSCP) is introduced to enhance generative policy training by predicting direct completion vectors, enabling accurate one-shot action generation.
SSCP combines the richness of generative models with the efficiency of unimodal policies, offering improved training and inference speed without the need for long backpropagation chains.
SSCP not only performs well in standard offline RL and behavior cloning benchmarks but also supports goal-conditioned RL, making it a versatile and efficient framework for deep RL and sequential decision-making.