Data pruning techniques are essential for training advanced machine learning models on massive datasets.
Existing pruning techniques often require a full initial training pass, which can be inefficient for single training runs.
A new importance score extrapolation framework has been introduced to predict sample importance with minimal data usage.
The framework demonstrates effectiveness across different datasets and training paradigms, offering scalability for expensive score calculation methods.