Machine learning models often experience performance decay when deployed in new contexts, with some subgroups being more affected than others.
Understanding the reasons behind large performance differences in subgroups is essential for implementing corrective measures efficiently.
A new framework, Subgroup-scanning Hierarchical Inference Framework for performance drifT (SHIFT), aims to identify groups with significant performance decay and explain the causes behind it.
Real-world experiments show that SHIFT helps in pinpointing interpretative subgroups affected by performance decay and proposing effective mitigation strategies.