Week 5 focuses on critical preprocessing techniques in data science: handling missing values (MCAR, MAR, MNAR), resampling techniques (undersampling, oversampling, SMOTE), and data encoding.
Missing values are gaps in datasets categorized into MCAR, MAR, and MNAR types, impacting analyses and models.
Handling strategies include deleting rows with missing values, deleting columns with duplicates, and imputation methods like mean, median, or mode.
Resampling techniques like undersampling, oversampling, and SMOTE are used to balance datasets for machine learning models.
SMOTE is Synthetic Minority Over-sampling Technique that generates synthetic samples for the minority class to address class imbalance.
SMOTE creates new samples based on existing minority class samples by interpolating between them, improving dataset balance.
Encoding methods covered include Nominal/One-Hot, Label, Ordinal, and Target-Guided Ordinal Encoding.
Preprocessing techniques are crucial for robust machine learning models, aiding in handling missing data, balancing datasets, and encoding categorical variables.
Stay tuned for Week 6 where Machine Learning and AI topics will be explored further.