This paper explores a cryptographic perspective on the mitigation versus detection of adversarial inputs in machine learning algorithms during inference time.
The study introduces defense by detection (DbD) and defense by mitigation (DbM) concepts, where correctness, completeness, and soundness properties are defined to ensure successful defense without significantly impacting algorithm performance.
The research indicates that achieving DbD and DbM is equivalent for machine learning classification tasks but differs for generative learning tasks, showcasing scenarios where mitigation is possible but detection is not provable.
The findings rely on cryptographic tools like Identity-Based Fully Homomorphic Encryption (IB-FHE) and Non-Parallelizing Languages with Average-Case Hardness (NPL) to demonstrate the feasibility of defending by mitigation under certain assumptions.