menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

A Unified ...
source image

Arxiv

2d

read

347

img
dot

Image Credit: Arxiv

A Unified Theory of Compositionality, Modularity, and Interpretability in Markov Decision Processes

  • Researchers introduce Option Kernel Bellman Equations (OKBEs) for a new reward-free Markov Decision Process.
  • OKBEs directly optimize a predictive map called a state-time option kernel (STOK) to maximize goal completion probability while avoiding constraint violations.
  • STOKs are compositional, modular, and interpretable initiation-to-termination transition kernels for policies in the Options Framework of Reinforcement Learning.
  • STOKs can be composed using Chapman-Kolmogorov equations for spatiotemporal predictions over long horizons and can be efficiently represented in a factorized and reconfigurable form.
  • STOKs record probabilities of goal-success and constraint-violation events, crucial for formal verification.
  • High-dimensional state models can be decomposed using local STOKs and goal-conditioned policies aggregated into a factorized goal kernel for solving complex planning problems.
  • The approach enables forward-planning at the goal level in high-dimensions, providing flexible agents capable of rapidly synthesizing meta-policies and reusing planning representations.
  • Option Kernel Bellman Equations (OKBEs) support verifiable long-horizon planning and intrinsic motivation in dynamic high-dimensional world-models.
  • Researchers argue that reward-maximization conflicts with compositionality, modularity, and interpretability in reinforcement learning.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app