<ul data-eligibleForWebStory="true"><li>Diffusion models have advanced in image quality aligned with textual prompts, raising safety concerns.</li><li>A unique jailbreak method prompts T2I models to create unsafe content when combining images with safe texts.</li><li>A dataset was created to test diffusion-based text-to-image (T2I) models under this jailbreak.</li><li>Nine T2I models, including commercial ones, were evaluated, showing a tendency to produce unsafe content.</li><li>Results indicate rates of unsafe generation varying from 10% to 70%, with DALLE 3 being notably unsafe.</li><li>Common filters like keyword blocklists and NSFW image filters were ineffective against this jailbreak.</li><li>Filters designed for single modality detection failed to prevent unsafe content generation.</li><li>The study delves into the text rendering capability and training data as reasons for such jailbreaks.</li><li>The research sets a basis for enhancing security and reliability of T2I models.</li><li>Project page available at https://multimodalpragmatic.github.io/</li></ul>

Multimodal Pragmatic Jailbreak on Text-to-image Models

Discover more