Bias evaluation is crucial for ensuring AI systems are trustworthy by assessing data quality and AI outputs.
Classical metrics like Total Variation and Wasserstein distances have high sample complexities, leading to limitations in many practical scenarios.
A new distance metric called Maximum Subgroup Discrepancy (MSD) is proposed in this paper.
MSD measures closeness between two distributions based on low discrepancies across feature subgroups.
Despite an exponential number of subgroups, the sample complexity of MSD remains linear in the number of features, making it practical for real-world applications.
An algorithm based on Mixed-integer optimization (MIO) is introduced for evaluating the distance.
MSD is easily interpretable, facilitating bias identification and correction.
The paper introduces a general bias detection framework, MSDD distances, in which MSD fits well.
Empirical evaluations comparing MSD with other metrics demonstrate its effectiveness on real-world datasets.