menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Policy-Bas...
source image

Arxiv

2d

read

377

img
dot

Image Credit: Arxiv

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

  • Researchers have introduced a novel task of clustering trajectories from offline reinforcement learning datasets, where each cluster center represents the policy that generated the trajectories.
  • The clustering objective is formulated based on the KL-divergence of offline trajectory distributions and a mixture of policy-induced distributions.
  • To address this task, Policy-Guided K-means (PG-Kmeans) and Centroid-Attracted Autoencoder (CAAE) are proposed.
  • PG-Kmeans trains behavior cloning policies and assigns trajectories based on policy generation probabilities, while CAAE guides the latent representations of trajectories toward specific codebook entries for clustering.
  • The finite-step convergence of PG-Kmeans is theoretically proven, highlighting a challenge in offline trajectory clustering due to policy-induced conflicts.
  • Experimental validation on the D4RL dataset and custom GridWorld environments demonstrates the effectiveness of PG-Kmeans and CAAE in partitioning trajectories into meaningful clusters.
  • The research suggests that these methods offer a promising framework for policy-based trajectory clustering, applicable in offline RL and beyond.

Read Full Article

like

22 Likes

For uninterrupted reading, download the app