Long-term planning in reinforcement learning involves finding strategies that collectively work toward a goal rather than solely optimizing immediate outcomes.
Quantifying dependencies between planned actions in a strategy is done through strategic link scores, measuring the drop in the likelihood of one decision if a follow-up decision is no longer available.
The utility of strategic link scores is demonstrated in three practical applications: explaining black-box RL agents, improving decision support systems, and characterizing planning processes of non-RL agents through interventions.
For instance, one application involves analyzing a traffic simulator by observing the emergent routing behavior of drivers through road closures to measure the effective planning horizon.