Imbalanced binary classification problems are common in various fields of study.
Subsampling the majority class to create a balanced training dataset can bias the model's predictions.
Calibrating a random forest model using prevalence estimates can lead to unintended negative consequences, including upwardly biased estimates.
Random forests' prevalence estimates depend on the number of predictors considered at each split and the sampling rate used, revealing unexpected biases.