Active learning reduces human annotation costs for machine learning systems by strategically selecting the most informative unlabeled data for annotation, but performing it individually may still be insufficient due to restricted data diversity and annotation budget.
Federated Active Learning (FAL) facilitates collaborative data selection and model training while preserving the confidentiality of raw data samples.
Existing FAL methods fail to account for the heterogeneity of data distribution across clients and the associated fluctuations in global and local model parameters, adversely affecting model accuracy.
To address these challenges, CHASe (Client Heterogeneity-Aware Data Selection) is proposed, which focuses on identifying unlabeled samples with high epistemic variations and incorporates techniques for tracking variations, calibrating decision boundaries, and enhancing data selection efficiency.