Contrastive learning is a framework where positive views are made similar and negative views are kept far apart in a data representation space.
Non-contrastive learning methods like BYOL and SimSiam eliminate negative examples to improve computational efficiency.
A study outlined by Tian et al. showed that collapse of learned representations can be prevented by stronger data augmentation compared to regularization.
However, this analysis did not consider the impact of feature normalization, a key step before measuring similarity of representations.
Excessively strong regularization combined with feature normalization may lead to undesired collapse of dynamics.
The study introduces a new theory based on cosine loss with feature normalization, showcasing sixth-order dynamics that prevent collapse.
This approach leads to stable equilibrium even when initial parameters could lead to collapsed solutions.
The research emphasizes the pivotal role of feature normalization in robustly preventing collapses in learning dynamics.