<ul data-eligibleForWebStory="false"><li>Inference-time intervention (ITI) is a method for steering large language models (LLM) behavior without updating model parameters.</li><li>Multi-Attribute Targeted Steering (MAT-Steer) is introduced to handle conflicts in multi-attribute settings by selectively intervening at the token level.</li><li>MAT-Steer uses alignment objectives to shift model representations to reduce conflicts between attributes like helpfulness and toxicity.</li><li>MAT-Steer outperforms existing ITI and fine-tuning approaches across question answering and generative tasks.</li></ul>

Multi-Attribute Steering of Language Models via Targeted Intervention

Discover more