Instruction tuning has been crucial for enhancing large language models (LLMs), but current iterative data selection methods are computationally intensive.
A new framework called LEAD has been proposed to address this by accurately estimating sample utility within the standard training loop.
LEAD utilizes Instance-Level Dynamic Uncertainty (IDU) to combine various factors for utility estimation and employs a two-stage selection strategy for efficiency.
Experiments demonstrate that LEAD outperforms existing methods, improving model performance by 6.1%-10.8% while using only 2.5% of the training data and reducing training time significantly.