Adversarial Training (AT) has been widely applied to harden malware classifiers against adversarial evasive attacks.
The effectiveness of AT in identifying and strengthening vulnerable areas of the model's decision space while maintaining high performance on clean data remains underexplored.
Robustness achieved by AT has often been assessed against unrealistic or weak adversarial attacks, negatively affecting performance on clean data.
Factors such as data, feature representations, classifiers, and robust optimization settings influence the effectiveness of AT.