<ul><li>Large language models (LLMs) can be prompted with specific styles, even in jailbreak queries, but the safety impact of these style patterns is unclear.</li><li>In a study evaluating 32 LLMs across seven jailbreak benchmarks, it was found that malicious queries with style patterns increased the attack success rate for almost all models.</li><li>ASR inflation correlated with the length of style patterns and the attention LLMs placed on them.</li><li>The study revealed that fine-tuning LLMs with specific styles made them more vulnerable to jailbreaks of those same styles. A defense strategy called SafeStyle was proposed to mitigate these risks and consistently outperformed baselines in maintaining LLM safety.</li></ul>

When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment

Discover more