MixAT is a novel method that combines discrete and continuous adversarial attacks during training for Large Language Models (LLMs).
The MixAT approach aims to improve the robustness of LLMs by addressing vulnerabilities to various types of attacks.
By introducing a combination of discrete and continuous attacks, MixAT demonstrates better defense against adversarial attacks while maintaining runtime efficiency.
The research highlights MixAT's potential for enhancing the safety and reliability of LLMs and offers a promising tradeoff between robustness and computational cost.