Navigation

Pavlo Beylin

Advisors
Leo Schwinn (M.Sc.), René Raab (M.Sc.), Prof. Dr. Björn Eskofier

Duration
02/2021 – 08/2021

Abstract

Adversarial attacks attempt to fool deep learning systems into misclassifying the provided input.
Thereby, existing input data is modified in a subtle way, such that the modifications are usually
imperceptible for humans. These minor changes, however, lead to a dramatically degraded classification
performance of the attacked system. This makes distinguishing adversarial from benign
input hard for machines and humans alike [1]. In an attempt to uncover the underlying mechanics
of the general vulnerability of modern artificial neural network architectures to adversarial attacks,
a number of attack and defense strategies have been proposed [2]. Current research focuses
mainly on the metric of robust accuracy. In other words – how well are the models classifying the
data correctly, despite the modifications. In each iteration of the ”arms race” between offensive
and defensive strategies the research is usually limited to only a single type of adversarial attack
(e.g., attacks constrained by the L1 norm) [1, 2].
In this thesis we propose a defensive approach that aims to generalize across multiple types of
attacks. To achieve this we formulate the problem of adversarial training in terms of multi-task
learning. This technique fosters generalization by sharing the model parameters between tasks [3].
In the context of adversarial training we apply this strategy, such that the network is forced to
simultaneously detect and classify a potential attack. To that end the network shall predict the
class and the identified input perturbation simultaneously. This means that during training we
confront the model with different norms, that restrict the data modification.

 

References:

[1] Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru,
Goodfellow, Ian J., and Fergus, Rob.: Intriguing properties of neural networks. ICLR,
abs/1312.6199, 2014b. URL http://arxiv.org/abs/1312.6199

[2] Xiaoyong Yuan, Pan He, Qile Zhu, Xiaolin Li. Adversarial examples: Attacks and defenses
for deep learning. IEEE transactions on neural networks and learning systems, 2019. URL
http://arxiv.org/abs/2009.03728

[3] Sebastian Ruder. An overview of multi-task learning in deep neural networks. ar-
Xiv:1706.05098v1, 2017. URL http://arxiv.org/abs/1706.05098

[4] Bakhti, Fezza, Hamidouche, Déforges, DDSA: A Defense Against Adversarial Attacks Using
Deep Denoising Sparse Autoencoder. IEEE Access, vol. 7, pp. 160397-160407, 2019. DOI:
10.1109/ACCESS.2019.2951526.

[5] Croce, Andriushchenko, Sehwag, Flammarion, Chiang, Mittal, Hein. RobustBench: a
standardized adversarial robustness benchmark arXiv:2010.09670v1 [cs.LG], 2020. URL
http://arxiv.org/abs/2010.09670, http://github.com/RobustBench/robustbench