Self-supervised pre-training of medical foundation models on large-scale datasets is a common approach for achieving good performance.Current methods for increasing pre-training data volume do not necessarily improve model performance.The introduction of V-information in self-supervised pre-training provides a theoretical foundation for sample selection.OptiDEL, an optimized data-effective learning method, outperforms existing approaches on multiple datasets by using 20x less training data.