Researchers introduce Option Kernel Bellman Equations (OKBEs) for a new reward-free Markov Decision Process.
OKBEs directly optimize a predictive map called a state-time option kernel (STOK) to maximize goal completion probability while avoiding constraint violations.
STOKs are compositional, modular, and interpretable initiation-to-termination transition kernels for policies in the Options Framework of Reinforcement Learning.
STOKs can be composed using Chapman-Kolmogorov equations for spatiotemporal predictions over long horizons and can be efficiently represented in a factorized and reconfigurable form.
STOKs record probabilities of goal-success and constraint-violation events, crucial for formal verification.
High-dimensional state models can be decomposed using local STOKs and goal-conditioned policies aggregated into a factorized goal kernel for solving complex planning problems.
The approach enables forward-planning at the goal level in high-dimensions, providing flexible agents capable of rapidly synthesizing meta-policies and reusing planning representations.
Option Kernel Bellman Equations (OKBEs) support verifiable long-horizon planning and intrinsic motivation in dynamic high-dimensional world-models.
Researchers argue that reward-maximization conflicts with compositionality, modularity, and interpretability in reinforcement learning.