<ul data-eligibleForWebStory="true"><li>This paper introduces a framework for learning a state representation for Markov decision processes solely from state trajectories.</li><li>The framework does not require reward signals or actions executed by the agent.</li><li>The proposed framework focuses on learning the minimum action distance (MAD), which is the minimum number of actions needed to move between states.</li><li>MAD serves as a fundamental metric capturing the environment's structure and assists in goal-conditioned reinforcement learning and reward shaping.</li><li>The self-supervised learning approach constructs an embedding space where the distances between states correspond to their MAD.</li><li>The approach is evaluated on various environments with known MAD values, including deterministic and stochastic dynamics, discrete and continuous state spaces, and noisy observations.</li><li>Empirical results show that the proposed method efficiently learns accurate MAD representations and outperforms existing state representation methods in terms of quality.</li></ul>

Learning The Minimum Action Distance

Discover more