<ul><li>Missing data imputation in tabular datasets is a challenge in data science and machine learning, especially in socioeconomic research.</li><li>Strict data protection protocols limit the sharing of real-world socioeconomic datasets, hindering reproducibility and benchmark studies.</li><li>Researchers created the IMAGIC-500 dataset using the World Bank's synthetic dataset to evaluate missing data imputation methods on socioeconomic features.</li><li>The benchmark assesses imputation accuracy for various missing mechanisms and ratios, aiming to advance the development of robust imputation algorithms in social science research.</li></ul>

IMAGIC-500: IMputation benchmark on A Generative Imaginary Country (500k samples)

Discover more