Neuroimaging-based patient stratification holds promise for precision neuropsychiatry.
Dataset characteristics such as cluster separation, size imbalance, noise, and disease-related effects influence clustering algorithm success.
Four widely used stratification algorithms were evaluated on synthetic brain-morphometry cohorts.
Data complexity was found to be more crucial than the choice of algorithm for successful stratification.
Well-separated clusters yielded high accuracy, while overlapping or unequal-sized clusters reduced accuracy.
SuStaIn had limitations in scaling, HYDRA's accuracy varied with data heterogeneity, SmileGAN and SurrealGAN detected patterns but did not assign discrete labels.
The study stresses the importance of dataset properties in shaping algorithm success and calls for realistic dataset distributions in algorithm development.