Steering vectors are a lightweight method for controlling text properties by adding a learned bias to language model activations at inference time.
Evaluation of steering vectors in adaptive free-form summarization tasks is explored beyond multiple-choice settings.
Steering vectors effectively control topical focus, sentiment, toxicity, and readability in abstractive summaries, but high steering strengths can degrade text quality.
Combining steering and prompting offers the strongest control over text properties with a favorable efficacy-quality trade-off at moderate steering strengths.