Vision foundation models pre-trained on massive data can be fine-tuned for downstream tasks but may lead to forgetting of concepts on other tasks.
Recent methods aim to prevent forgetting without impacting fine-tuning performance by matching model weights or features, but this can be too strong.
Proxy-FDA is a new regularization method that preserves structural knowledge in feature space by performing Feature Distribution Alignment using nearest neighbor graphs.
Experiments show that Proxy-FDA reduces concept forgetting during fine-tuning and has benefits in various fine-tuning settings and tasks like image classification, captioning, and VQA.