<ul><li>Researchers have developed a method to improve instruction-following in language models through activation steering.</li><li>The method involves deriving instruction-specific vector representations from language models and using them to steer the models accordingly.</li><li>Activation vectors computed as the difference in activations between inputs with and without instructions enable modular approach to activation steering.</li><li>The approach enhances model adherence to constraints such as output format, length, and word inclusion, providing control over instruction following.</li></ul>

Improving Instruction-Following in Language Models through Activation Steering

Discover more