This paper focuses on the out-of-distribution (OOD) generalization of self-supervised learning (SSL).
Analysis of mini-batch construction in SSL training reveals one explanation for SSL's OOD generalization.
SSL learns spurious correlations during training, leading to a decrease in OOD generalization.
To address this issue, a post-intervention distribution (PID) grounded in the Structural Causal Model is proposed, along with a batch sampling strategy enforcing PID constraints for optimal worst-case OOD performance.