Researchers from the National Institute of Health Data Science at Peking University have published a systematic review that evaluates the use of machine learning methods in dealing with missing datasets in electronic health records (EHRs).
According to the study, traditional statistical techniques for addressing missing data in EHRs frequently fall short
Machine learning methods such as Generative Adversarial Networks (GANs) and k-Nearest Neighbors (KNN) have been shown to consistently enhance the performance of data handling in both longitudinal and cross-sectional datasets.
However, the study revealed that no single technique stands as a panacea for all EHR data scenarios, highlighting the nneed for selecting appropriate methodologies based on the type of dataset.
The authors propose a standardized protocol to navigate the challenges posed by missing data in electronic health records, aspiring to create a universally accepted protocol for handling missing data in electronic health records, ensuring more reliable and reproducible findings across medical research.
However, the study identified challenges of heterogeneity found in electronic health records, which are a variable factor complicating the application of a one-size-fits-all approach to data imputation.
Future research will need to establish universal benchmarks for evaluating machine learning methodologies used to address missing data in EHRs.
This study serves as groundwork for ongoing advancements within the field of health data science.
Advancements in managing missing data in EHRs could unlock the potential of electronic health records, influencing clinical practices, healthcare policy decisions, and patient outcomes across the globe.
This study serves as pivotal groundwork for ongoing developments in the field of health data science.