Missing data imputation in tabular datasets is a challenge in data science and machine learning, especially in socioeconomic research.
Strict data protection protocols limit the sharing of real-world socioeconomic datasets, hindering reproducibility and benchmark studies.
Researchers created the IMAGIC-500 dataset using the World Bank's synthetic dataset to evaluate missing data imputation methods on socioeconomic features.
The benchmark assesses imputation accuracy for various missing mechanisms and ratios, aiming to advance the development of robust imputation algorithms in social science research.