Researchers have developed an adversarial attack that can bypass safety mechanisms in multi-agent Large Language Model (LLM) systems.
The attack optimizes prompt distribution across latency and bandwidth-constrained network topologies to maximize attack success rate while minimizing detection risk.
The method outperforms conventional attacks, exposing critical vulnerabilities in multi-agent systems.
Existing defenses, including variants of Llama-Guard and PromptGuard, fail to prohibit the attack.