<ul><li>Adversarial threats against LLMs are evolving faster than current defenses can adapt, showing a critical geometric blind spot in alignment.</li><li>Introducing ALKALI, a benchmark with 9,000 prompts across various attack families to assess the vulnerability of 21 leading LLMs, highlighting high Attack Success Rates (ASRs).</li><li>To address the vulnerability of latent camouflage, GRACE - Geometric Representation Aware Contrastive Enhancement is introduced, reducing ASR by up to 39% through preference learning and latent space regularization.</li><li>AVQI, a geometry-aware metric, is introduced to quantify latent alignment failure by measuring cluster separation and compactness, providing insights into how models encode safety internally.</li></ul>

AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)

AdversariaL attacK sAfety aLIgnment(ALKALI): Safeguarding LLMs through GRACE: Geometric Representation-Aware Contrastive Enhancement- Introducing Adversarial Vulnerability Quality Index (AVQI)

Discover more