menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Week 5: Ha...
source image

Medium

1w

read

361

img
dot

Image Credit: Medium

Week 5: Handling Missing Values, Resampling, and Data Encoding in Data Science

  • Week 5 focuses on critical preprocessing techniques in data science: handling missing values (MCAR, MAR, MNAR), resampling techniques (undersampling, oversampling, SMOTE), and data encoding.
  • Missing values are gaps in datasets categorized into MCAR, MAR, and MNAR types, impacting analyses and models.
  • Handling strategies include deleting rows with missing values, deleting columns with duplicates, and imputation methods like mean, median, or mode.
  • Resampling techniques like undersampling, oversampling, and SMOTE are used to balance datasets for machine learning models.
  • SMOTE is Synthetic Minority Over-sampling Technique that generates synthetic samples for the minority class to address class imbalance.
  • SMOTE creates new samples based on existing minority class samples by interpolating between them, improving dataset balance.
  • Encoding methods covered include Nominal/One-Hot, Label, Ordinal, and Target-Guided Ordinal Encoding.
  • Preprocessing techniques are crucial for robust machine learning models, aiding in handling missing data, balancing datasets, and encoding categorical variables.
  • Stay tuned for Week 6 where Machine Learning and AI topics will be explored further.

Read Full Article

like

21 Likes

For uninterrupted reading, download the app