menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Future Eve...
source image

Arxiv

14h

read

215

img
dot

Image Credit: Arxiv

Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

  • Large language models (LLMs) can distinguish past from future events with 90% accuracy.
  • Backdoors triggered by a temporal distributional shift can activate when exposed to news headlines beyond their training cut-off dates.
  • Fine-tuning on helpful, harmless, and honest (HHH) data is effective in removing backdoor triggers in backdoored models.
  • Standard safety measures are enough to remove backdoors in models at the modest scale tested.

Read Full Article

like

12 Likes

For uninterrupted reading, download the app