Concept bottleneck models (CBMs) aim to decompose predictions into human interpretable concepts.
Annotations used for training CBMs are often noisy, impacting prediction performance and interpretability.
A study on noise in CBMs shows that corruption impairs prediction performance, interpretability, and intervention effectiveness.
A proposed two-stage framework aims to mitigate vulnerability by stabilizing learning of noise-sensitive concepts during training and correcting uncertain concepts during inference.