Federated learning has become popular for sharing sensitive data without sharing raw data.
Existing federated clustering methods are limited in handling complex data partitioning scenarios.
Data collaboration clustering (DC-Clustering) is a new federated clustering method proposed to address complex data partitioning scenarios.
DC-Clustering ensures privacy preservation by only sharing intermediate representations among institutions, not raw data.
The method supports k-means and spectral clustering, achieving results with one communication round to the central server.
Experiments with synthetic and benchmark datasets show that DC-Clustering performs comparably to centralized clustering.
DC-Clustering fills a gap in federated learning research by enabling effective knowledge discovery from distributed heterogeneous data.
Its features include privacy preservation, communication efficiency, and flexibility, making it valuable for privacy-sensitive domains like healthcare and finance.