Cluster analysis is a key topic in statistics and machine learning, with the challenge of determining if two sample subsets belong to the same cluster.
Classic two-sample tests used in clustering scenarios can lead to inflated Type-I error rates, necessitating the development of a new approach known as the two-cluster test.
A novel method utilizing boundary points between subsets is introduced to calculate analytical p-values, effectively reducing the Type-I error rate compared to traditional two-sample tests.
Experiments on synthetic and real datasets demonstrate the effectiveness of the proposed two-cluster test in various clustering applications, including tree-based interpretable clustering and significance-based hierarchical clustering.