menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

FC-Attack:...
source image

Arxiv

3d

read

115

img
dot

Image Credit: Arxiv

FC-Attack: Jailbreaking Multimodal Large Language Models via Auto-Generated Flowcharts

  • Multimodal Large Language Models (MLLMs) are vulnerable to jailbreak attacks where harmful content can be induced, posing safety risks despite safety alignment efforts.
  • A new method named FC-Attack utilizes auto-generated flowcharts with partially harmful information to trick MLLMs into providing additional harmful details.
  • FC-Attack fine-tunes a pre-trained model to create a step-description generator from benign datasets, then transforms harmful queries into flowcharts for the attack.
  • The flowcharts come in vertical, horizontal, and S-shaped forms, combined with benign text prompts to execute the attack on MLLMs, achieving high success rates.
  • Evaluations on Advbench demonstrate FC-Attack's success rates of up to 96% via images and up to 78% via videos across various MLLMs.
  • Factors affecting the attack performance, such as the number of steps and font styles in the flowcharts, are investigated, with font style changes improving success rates.
  • FC-Attack enhances jailbreak performance from 4% to 28% in Claude-3.5 by altering font styles.
  • Several defense mechanisms, including AdaShield, help mitigate the attack; however, they may come at the cost of reduced utility.

Read Full Article

like

6 Likes

For uninterrupted reading, download the app