Researchers propose an approach to enhance the robustness of vision transformers (ViTs) inspired by the concept of nullspace from linear algebra.
The investigation focuses on whether ViTs can exhibit resilience to input variations akin to the nullspace property in linear mappings.
The researchers extend the notion of nullspace to nonlinear settings and demonstrate the synthesis of approximate nullspace elements for ViT's encoder blocks through optimization.
A finetuning strategy is proposed for ViTs by augmenting the training data with synthesized approximate nullspace noise, leading to improved robustness.