Effective and reliable control over large language model (LLM) behavior is a significant challenge.
The existing activation steering methods lack precision and interpretability in influencing model outputs.
Feature Guided Activation Additions (FGAA) is a novel activation steering method that provides better steering effects with coherence of model outputs.
FGAA outperforms existing steering methods of CAA, SAE decoder steering, and SAE-TS in steering tasks across various models.