<ul><li>Researchers propose a discrete cosine transform (DCT)-based approach for initializing and compressing the attention mechanism in Vision Transformers.</li><li>The DCT-based attention initialization method offers improved accuracy in classification tasks for Vision Transformers.</li><li>DCT effectively decorrelates image information in the frequency domain, enabling the compression of higher-frequency components.</li><li>The DCT-based compression technique reduces the size of weight matrices for queries, keys, and values, resulting in decreased computational overhead.</li></ul>

Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers

Discover more