05/2021 – 11/2021
Deep neural networks (DNNs) are nowadays used in a variety of applications, such as image
classification, speech recognition and autonomous driving . Still, recent studies have shown that
DNNs are vulnerable to adversarial examples . In image classification, adversarial examples are
images obtained by applying small perturbations that are barely perceived by humans, but lead
to a different predicted output with a high confidence [2, 1].
To increase the robustness of a DNN there are several approaches. The network can e.g. be trained
with adversarial training, where the data is augmented with adversarial examples until the neural
network assigns the correct classes [5, 7]. Rakin et al. proposed a method that combines adversarial
training with Parametric Noise Injection (PNI) where additionally trainable Gaussian noise is
injected into each layer . Possible drawbacks of this approach are that the noise parameters
tend to become quite small making the noise injection less effective over time . In preliminary
experiments we also noticed that perturbing the adversarial attacks during the training of the
network leads to weaker attacks and thus to less robust defenses against strong attacks.
In this thesis a noise injection approach for adversarial training will be implemented. This includes
training a neural network or using an already trained robust network and afterwards training
a learnable pixel-wise noise injection layer in front of the network. It is evaluated whether this
approach can increase the robustness of a DNN against adversarial attacks. Additionally human
visual saliency information is combined with the noise injection to determine if this can further
improve the robustness of neural networks. With saliency information noise can be added specifically
to each pixel depending on whether it belongs to the foreground or the background of the
image. This approach aims to weaken adversarial attacks in the background of the image while
the foreground is mostly left untouched. In this approach the knowledge about visual attention
from the COCO data set  is used and in a second step the results from the COCO data set
annotations are compared to an approach with saliency information obtained by an automatic
method, e.g. see .
 Madry, Aleksander, et al.: Towards Deep Learning Models Resistant to Adversarial Attacks.
 Goodfellow, Ian J., et al.: Explaining and Harnessing Adversarial Examples. 2015.
 Jeddi, Ahmadreza, et al.: Learn2Perturb: an End-to-end Feature Perturbation Learning to
Improve Adversarial Robustness. 2020.
 Rakin, Adnan Siraj, et al.: Parametric Noise Injection: Trainable Randomness to Improve
Deep Neural Network Robustness against Adversarial Attack. 2018.
 Schott, Lukas, et al: Towards the first adversarially robust neural network model on MNIST.
 Lin, Tsung-Yi, et al: Microsoft COCO: Common Objects in Context. 2015.
 Athalye, Anish, et al.: Obfuscated Gradients Give a False Sense of Security: Circumventing
Defenses to Adversarial Examples. 2018.
 Itti, Laurent, et al.: A model of saliency-based visual attention for rapid scene analysis.