<ul><li>Hidden confounders can bias policy learning in reinforcement learning algorithms by influencing both states and actions.</li><li>DoSAC (Do-Calculus Soft Actor-Critic with Backdoor Adjustment) is proposed to correct for hidden confounding via causal intervention estimation.</li><li>DoSAC estimates the interventional policy using the backdoor criterion without needing access to true confounders or causal labels.</li><li>Empirical results on continuous control benchmarks demonstrate that DoSAC outperforms baselines under confounded settings, with improved robustness, generalization, and policy reliability.</li></ul>

Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic

Discover more