02/2021 – 08/2021
Adversarial attacks attempt to fool deep learning systems into misclassifying the provided input. Thereby, existing input data is modified in a subtle way, such that the modifications are usually imperceptible for humans. These minor changes, however, lead to a dramatically degraded classification performance of the attacked system. This makes distinguishing adversarial from benign input hard for machines and humans alike . In an attempt to uncover the underlying mechanics of the general vulnerability of modern artificial neural network architectures to adversarial attacks, a number of attack and defense strategies have been proposed . Current research focuses mainly on the metric of robust accuracy. In other words – how well are the models classifying the data correctly, despite the modifications. In each iteration of the ”arms race” between offensive and defensive strategies the research is usually limited to only a single type of adversarial attack (e.g., attacks constrained by the L1 norm) [1, 2].
In this thesis we propose a defensive approach that aims to generalize across multiple types of attacks. To achieve this we formulate the problem of adversarial training in terms of multi-task learning. This technique fosters generalization by sharing the model parameters between tasks . In the context of adversarial training we apply this strategy, such that the network is forced to simultaneously detect and classify a potential attack. To that end the network shall predict the class and the identified input perturbation simultaneously. This means that during training we confront the model with different norms, that restrict the data modification.
 Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian J., and Fergus, Rob.: Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014b. URL http://arxiv.org/abs/1312.6199
 Xiaoyong Yuan, Pan He, Qile Zhu, Xiaolin Li. Adversarial examples: Attacks and defenses for deep learning. IEEE transactions on neural networks and learning systems, 2019. URL http://arxiv.org/abs/2009.03728
 Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv:1706.05098v1, 2017. URL http://arxiv.org/abs/1706.05098
 Bakhti, Fezza, Hamidouche, Déforges, DDSA: A Defense Against Adversarial Attacks Using Deep Denoising Sparse Autoencoder. IEEE Access, vol. 7, pp. 160397-160407, 2019. DOI: 10.1109/ACCESS.2019.2951526.
 Croce, Andriushchenko, Sehwag, Flammarion, Chiang, Mittal, Hein. RobustBench: a standardized adversarial robustness benchmark arXiv:2010.09670v1 [cs.LG], 2020. URL http://arxiv.org/abs/2010.09670, http://github.com/RobustBench/robustbench