The paper introduces the Medical Data Pecking approach for evaluating the quality of structured medical data used in Electronic Health Records (EHRs) for research and AI training.
The approach utilizes unit testing and coverage concepts from software engineering to identify data quality concerns and includes the Medical Data Pecking Tool (MDPT).
MDPT was tested on three datasets and successfully identified non-aligned or non-conforming data issues, demonstrating its effectiveness in improving data quality for research purposes.
The approach incorporates external medical knowledge to enhance context-sensitive data quality testing and aims to address challenges in data quality assessment for research purposes.