<ul data-eligibleForWebStory="true"><li>Researchers from HiddenLayer have devised a new LLM attack called TokenBreaker.</li><li>They can bypass certain protections by adding or changing a single character, while the LLM still understands the original intent.</li><li>The attack targets LLMs using tokenization strategies like Byte Pair Encoding or WordPiece.</li><li>Tokenization breaks text into tokens for LLMs to process.</li><li>By adding characters to keywords, protective models can be fooled into thinking prompts are safe.</li><li>This can bypass defenses to sneak malicious content past filters, potentially leading to malware exposure.</li><li>The end target can still interpret the manipulated text, rendering the protection model ineffective.</li><li>Models employing Unigram tokenizers were found to be more resistant to such manipulation.</li><li>Mitigation strategies include choosing models with stronger tokenization methods.</li></ul>

This cyberattack lets hackers crack AI models just by changing a single character

Discover more