menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Shape it U...
source image

Arxiv

1d

read

66

img
dot

Image Credit: Arxiv

Shape it Up! Restoring LLM Safety during Finetuning

  • Finetuning large language models (LLMs) introduces critical safety risks as even a few harmful examples can compromise safety alignment.
  • Static safety shaping, which updates the model equally on harmful and harmless parts of a response, is deemed suboptimal due to shifting safety context within examples.
  • Proposed dynamic safety shaping (DSS) framework reinforces learning from safe segments of a response while suppressing unsafe content by using fine-grained safety signals.
  • The Safety Trajectory Assessment of Response (STAR) token-level signal enables shaping to operate dynamically over the training sequence, leading to substantial safety improvements without compromising task capability.

Read Full Article

like

3 Likes

For uninterrupted reading, download the app