Finetuning large pretrained neural networks can be resource-intensive in terms of memory and computational cost.
Researchers have observed a correlation between large gradients and small-magnitude weights during finetuning.
NANOADAM is proposed as a method that dynamically updates only small-magnitude weights during finetuning, offering practical advantages such as gradient-free determination of parameter subsets and better generalization performance in experiments.
The proposed method, NANOADAM, has shown benefits for both NLP and vision tasks by preserving large-magnitude weights encoding critical features from pretraining and enabling the use of larger learning rates.