Training data attribution (TDA) methods aim to identify influential training examples for a model's predictions on specific test data.
Gradient-based TDA methods are limited in scalability, and recent random projection-based methods often sacrifice attribution accuracy.
Daunce is introduced as a data attribution approach through uncertainty estimation, fine-tuning perturbed models and computing covariance of losses for attribution scores.
Daunce is scalable to large language models, achieves more accurate attribution, and successfully applied to OpenAI's GPT models for data attribution.