Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, where an attacker can tamper with the model to behave maliciously on specific inputs.
Backdoor attacks can be injected during training or by altering the weights and biases of the neural network.
Outsourcing training through Machine Learning as a Service (MLaaS) can introduce security risks, including receiving backdoored models.
Backdoored neural networks perform well on regular inputs but exhibit misclassifications based on hidden triggers, posing risks in applications like autonomous driving.
Methods like Neural Cleanse (NC) and FeatureRE have been developed to detect and reverse backdoors in neural networks.
Recent advancements like BTIDBF and BAN aim to address feature space backdoor attacks by efficiently detecting triggers and utilizing adversarial noise.
BAN approach involves generating adversarial neuron noise and masked feature maps to identify and differentiate between benign and backdoored neurons.
BAN has shown efficiency and scalability in identifying backdoors in neural networks, achieving an average accuracy of about 97.22% across different architectures.
Given the challenge of detecting backdoors that do not significantly impact model performance, continuous monitoring and research in this area are crucial for securing deep neural networks.
Research and advancements in the field of neural network security, especially in combating backdoor attacks, are essential for maintaining the integrity of machine learning systems.