Diffusion policies are widely used in decision-making scenarios like robotics, gaming, and autonomous driving for learning diverse skills.
Existing diffusion policies could be sub-optimal due to limited demonstration data, leading to sub-optimal trajectories and failures.
A new framework called NCDPO has been introduced to address challenges in fine-tuning diffusion policies, enabling tractable likelihood evaluation and gradient backpropagation through all diffusion timesteps.
Experiments demonstrate that NCDPO achieves comparable sample efficiency to traditional methods and outperforms them in both efficiency and final performance across various benchmarks.