<ul data-eligibleForWebStory="false"><li>Imbalanced binary classification problems are common in various fields of study.</li><li>Subsampling the majority class to create a balanced training dataset can bias the model's predictions.</li><li>Calibrating a random forest model using prevalence estimates can lead to unintended negative consequences, including upwardly biased estimates.</li><li>Random forests' prevalence estimates depend on the number of predictors considered at each split and the sampling rate used, revealing unexpected biases.</li></ul>

Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased

Challenges learning from imbalanced data using tree-based models: Prevalence estimates systematically depend on hyperparameters and can be upwardly biased

Discover more