This study proposes an oversampling strategy called MGS-GRF for rare event detection in binary classification on tabular data.
MGS-GRF is designed to handle mixed features (continuous and categorical variables) and exhibits coherence and association properties.
The method uses a kernel density estimator and locally estimated full-rank covariances to generate continuous features, while categorical features are drawn from the original samples through a generalized random forest.
Experimental results show that MGS-GRF outperforms other synthetic procedures in terms of predictive performances, as evaluated on both simulated and real-world datasets.