menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Guiding Gi...
source image

Arxiv

2d

read

344

img
dot

Image Credit: Arxiv

Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs

  • Activation steering offers an alternative for controlling Large Language Model (LLM) behaviors at inference time without the need for costly fine-tuning.
  • A lightweight, trainable controller network is introduced to dynamically modulate the intensity of a steering patch across the LLM's layers during generation.
  • The controller network predicts a global scaling factor and layer-specific weights to apply nuanced, layer-aware interventions primarily for harmful inputs.
  • Experiments show that this weighted steering controller significantly increases refusal rates compared to the base LLM, offering an efficient method for fine-grained control over LLM behavior.

Read Full Article

like

20 Likes

For uninterrupted reading, download the app