Data corruption, including missing and noisy data, presents challenges in machine learning.
Study explores strategies to mitigate effects of data corruption through supervised learning with NLP tasks and deep reinforcement learning for traffic signal optimization.
Analysis shows model performance under data corruption follows a diminishing return curve and noisy data causes severe performance degradation, especially in sequential decision-making tasks.
Increasing dataset size helps mitigate effects of data corruption, but a rule emerges that approximately 30% of data is critical for determining performance.