Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method where one of the matrices, $A$ or $B$, is traditionally initialized to zero to ensure fine-tuning starts from the pretrained model.
A new study investigates the impact of non-zero initialization on LoRA's fine-tuning dynamics, revealing that simultaneously initializing $A$ and $B$ to non-zero values improves robustness to suboptimal learning rates, especially smaller ones.
Analysis shows that non-zero initialization of $AB$ introduces random noise into pretrained weights but generally does not impact fine-tuning performance, suggesting that fine-tuning does not have to strictly begin from the pretrained model.
The research findings are supported by extensive experiments on various models and datasets with the code available at https://github.com/Leopold1423/non_zero_lora-icml25.