Unsupervised machine learning is commonly used to extract insights from large, unlabeled datasets in fields like climate science, biomedicine, astronomy, and chemistry.
This paper addresses the lack of standardization in unsupervised learning workflows for scientific discoveries by presenting a structured workflow.
The workflow includes steps such as formulating valid scientific questions, robust data preparation, exploring modeling techniques, rigorous validation, and effective communication of results.
An astronomy case study on refining Milky Way star globular clusters based on chemical composition is used to demonstrate the importance of validation and how a well-designed workflow can enhance scientific discovery.