Sparse autoencoders (SAEs) have helped address the understanding of vision and language processing mechanisms.SAEs trained on CLIP's vision transformer reveal distinct sparsity patterns across layers and token types.Metrics are introduced to quantify the steerability of SAE features, with 10-15% of neurons and features being steerable.Targeted suppression of SAE features improves performance on vision disentanglement tasks and defense against typographic attacks.