<ul><li>ShieldAgent is a guardrail agent designed to enforce safety policy compliance for other autonomous agents.</li><li>It constructs a safety policy model by extracting verifiable rules from policy documents and generates a shielding plan.</li><li>ShieldAgent-Bench, a dataset with 3K safety-related pairs of agent instructions and action trajectories, is introduced.</li><li>Experiments show that ShieldAgent outperforms prior methods, achieving high precision and efficiency in safeguarding agents.</li></ul>

ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

Discover more