CLEAR: Clean-up Trigger-Free Backdoor in Neural Networks

Research Paper / Oct 2021 / Machine learning/Deep learning /Artificial Intelligence, Neural network

The backdoor attack raises a serious security concern to deep neural networks, by fooling a model to misclassify certain inputs designed by an attacker. In particular, the trigger-free backdoor attack is a great challenge to be detected and mitigated. It targets one or a few specific samples, called target samples, to misclassify them to a target class. Without a trigger planted in the backdoor model, the existing backdoor detection schemes fail to detect the trigger-free backdoor as they depend on reverse-engineering the trigger. In this paper, we propose a novel scheme to detect and mitigate trigger-free backdoor attacks. We discover and demonstrate a unique feature of trigger-free backdoor attacks–they force a boundary change such that small “pockets” are formed around the target sample. Based on this observation, we propose a novel defense mechanism to pinpoint a malicious pocket by “wrapping” them into a tight convex hull in the feature space. We design an effective algorithm to search for such a convex hull and remove the backdoor by fine-tuning the model using the identified malicious samples with the corrected label according to the convex hull. The experiments show that the proposed approach is highly efficient for detecting and mitigating a wide range of trigger-free backdoor models.

CLEAR: Clean-up Trigger-Free Backdoor in Neural Networks

Related Content