menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Multi-Attr...
source image

Arxiv

2d

read

35

img
dot

Image Credit: Arxiv

Multi-Attribute Steering of Language Models via Targeted Intervention

  • Inference-time intervention (ITI) is a method for steering large language models (LLM) behavior without updating model parameters.
  • Multi-Attribute Targeted Steering (MAT-Steer) is introduced to handle conflicts in multi-attribute settings by selectively intervening at the token level.
  • MAT-Steer uses alignment objectives to shift model representations to reduce conflicts between attributes like helpfulness and toxicity.
  • MAT-Steer outperforms existing ITI and fine-tuning approaches across question answering and generative tasks.

Read Full Article

like

2 Likes

For uninterrupted reading, download the app