Benign overfitting is a phenomenon in machine learning where a model perfectly fits the training data, including noisy examples, yet still generalizes well to unseen data.
In this work, a conceptual shift towards almost benign overfitting is introduced, focusing on models that achieve both small training and test errors simultaneously, which is often seen in neural networks.
The study analyzes how the interaction between sample size and model complexity enables larger models to achieve good training fit and approach Bayes-optimal generalization in classical regimes.
The research provides theoretical evidence from case studies on kernel ridge regression and least-squares regression using a two-layer fully connected ReLU neural network, introducing a novel proof technique based on decomposing the excess risk into estimation and approximation errors.