Deep learning models often learn spurious correlations between targets and non-essential features, leading to spurious bias that hampers model performance on data lacking these correlations.
Existing methods for mitigating spurious bias require group labels that involve costly human annotations, which may not capture subtle biases like relying on specific pixels for predictions.
A new framework called ShortcutProbe is proposed, which does not rely on group labels to mitigate spurious bias. It identifies prediction shortcuts in a model's latent space and retrains the model for improved robustness.
The ShortcutProbe framework is shown to be theoretically effective and practically efficient in enhancing a model's robustness to spurious bias across various datasets.