Reinforcement learning agents can achieve superhuman performance, but their decisions are often difficult to interpret.
A theoretical framework has been developed to explain reinforcement learning through the influence of state features.
Three core elements of the agent-environment interaction benefiting from explanation are identified: behavior, performance, and value estimation.
The framework uses Shapley values from cooperative game theory to identify the influence of each state feature, offering mathematically grounded explanations with clear semantics.