Hidden confounders can bias policy learning in reinforcement learning algorithms by influencing both states and actions.
DoSAC (Do-Calculus Soft Actor-Critic with Backdoor Adjustment) is proposed to correct for hidden confounding via causal intervention estimation.
DoSAC estimates the interventional policy using the backdoor criterion without needing access to true confounders or causal labels.
Empirical results on continuous control benchmarks demonstrate that DoSAC outperforms baselines under confounded settings, with improved robustness, generalization, and policy reliability.