【論文閱讀】A Self-supervised Approach for Adversarial Robustness#CVPR2020

原創

2020-07-04 19:49

combine the benefits of Adversarial training and input processing and propose a self-supervised adversarial training mechanism in the input space. Code is available at: https://github.com/Muzammal-Naseer/NRP

介紹：

基於AT和輸入處理方法的互補性，我們在輸入空間中提出了一種自監督AT機制。我們的方法（圖1）使用最小最大（鞍點）公式來學習增強模型魯棒性的最優輸入處理函數。這樣，我們的優化規則隱式地執行T。我們的方法的主要優點是它的泛化能力，一旦在一個數據集上進行了訓練，就可以立即應用它來保護一個完全不同的模型。這使得它成爲一個更具吸引力的解決方案，相比之下，流行的a-T方法在計算上更爲昂貴（因此對大型數據集的可伸縮性較差）。此外，與以前的基於預處理的防禦相比，我們的防禦對於最近的攻擊是脆弱的，我們的防禦顯示出更好的健壯性。

主要貢獻：

•Task Generalizability: To ensure a task independent AT mechanism, we propose to adversarially train a purifying model named Neural Representation Purifier (NRP).Once trained, NRP can be deployed to safeguard across different tasks, e.g., classification, detection and segmentation, without any additional training (Sec. 3).
• Self-Supervision: The supervisory signal used for A T should be self-supervised to make it independent of label space. To this end, we propose an algorithm to train NRP on adversaries found in the feature space in random directions to avoid any label leakage (Sec. 3.1).
• Defense against strong perturbations: Attacks are continuously evolving. In order for NRP to generalize, it should be trained on worst-case perturbations that are transferable across different tasks. We propose to find highly transferable perceptual adversaries (Sec. 4.3).
• Maintaining Accuracy: A strong defense must concurrently maintain accuracy on the original data distribution.We propose to train the NRP with an additional discriminator to bring adversarial examples close to original samples by recovering the fine texture details (Sec. 4.2).