menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Multimodal...
source image

Arxiv

3d

read

382

img
dot

Image Credit: Arxiv

Multimodal Pragmatic Jailbreak on Text-to-image Models

  • Diffusion models have advanced in image quality aligned with textual prompts, raising safety concerns.
  • A unique jailbreak method prompts T2I models to create unsafe content when combining images with safe texts.
  • A dataset was created to test diffusion-based text-to-image (T2I) models under this jailbreak.
  • Nine T2I models, including commercial ones, were evaluated, showing a tendency to produce unsafe content.
  • Results indicate rates of unsafe generation varying from 10% to 70%, with DALLE 3 being notably unsafe.
  • Common filters like keyword blocklists and NSFW image filters were ineffective against this jailbreak.
  • Filters designed for single modality detection failed to prevent unsafe content generation.
  • The study delves into the text rendering capability and training data as reasons for such jailbreaks.
  • The research sets a basis for enhancing security and reliability of T2I models.
  • Project page available at https://multimodalpragmatic.github.io/

Read Full Article

like

23 Likes

For uninterrupted reading, download the app