Hermann Luft

Master's Thesis

Unsupervised evaluation of anomaly detection methods based on separability algorithms

Advisors
Philipp Schlieper (M.Sc.), Kai Klede (M.Sc.), Prof. Dr. Björn Eskofier

Duration
07 / 2022 – 01 / 2023

Abstract

In modern Internet of Things (IoT) applications, environmental sensors generate all kinds of data. Discrepancies in this data describe sporadic errors or anomalies that are unavoidable when working with non-deterministic systems and are of interest for the detection of outliers. It is well understood that overcoming the problem of filtering out these irregularities involves a significant step forward in research[2, 9]. Especially in real-world applications, corrupted data leads to an inaccurate or unreliable IoT-System [2, 9], which for instance, occurrences of severe errors are not acceptable in autonomous driving. The objective to recognize these outliers has developed multiple unsupervised algorithms [1, 4] based on statistics, artificial intelligence, or simple intuitions, such as Histogram-based Outlier Score, DeepAnT [3], or k-NN Global. Each idea provides unique outlier criteria and capabilities, depending on the situation. Thus, a deeper understanding of separability methods is required to determine whether an algorithm fits in the context of an evolving data filtering application [5]. Another challenge arises due to the limited availability of labeled training sets, as generating labeled data from sensors or medical diagnoses becomes very complex [4].

Furthermore, few research articles are available regarding the unsupervised evaluation of outlier detection algorithms [6, 10]. Following [6] by O. Marques et al., an internal evaluation is established with an “Internal, Relative Evaluation of Outlier Solutions” (IREOS) Index by applying a Kernel Logistic Regression (KLR) as a separability indicator. The KLR spans a margin between the found outliers and the remaining dataset while looking at the linearity of the decision boundary. Nonetheless, the provided method of evaluation is not exploited completely.

This thesis aims to compare different separability techniques within the IREOS algorithm. For this purpose, several concepts of maximum margin classifiers propose alternatives to the prior utilization of the KLR. Besides the separability indicator employed in the IREOS Index [6], a kNN-based distance technique or a nonlinear Support Vector Machine (SVM) fulfills similar response properties. However, the kNN-based alternative differs in time complexity and classification requirements. Hence, performance and results vary depending on the operating metric. A benchmark between different maximum margin classifiers allows new insights into the internal evaluation of outlier detection algorithms regarding efficiency and accuracy. A ground truth evaluation of the outlier detection algorithms is provided to cover the complete correctness of this process. The realization of the thesis starts with the reasoning of datasets [7, 8] used in the experiment, followed by applying the experimental design described by Marques et al. [6] as a working basis. Finally, the thesis assesses the kNN-based approach used for an internal evaluation concerning accuracy and performance. Correspondingly, the paper carries out an overview of the anomaly detection algorithms and datasets used in the internal evaluation.

References
[1] Goldstein, Markus & Uchida, Seiichi. (2016). A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data. PloS one. 11. e0152173.10.1371/journal.pone.0152173.
[2] Gaddam, A.; Wilkin, T.; Angelova, M.; Gaddam, J. Detecting Sensor Faults, Anomalies and Outliers in the Internet of Things: A Survey on the Challenges and Solutions. Electronics 2020, 9, 511. https://doi.org/10.3390/electronics9030511
[3] M. Munir, S. A. Siddiqui, A. Dengel and S. Ahmed, “DeepAnT: A Deep Learning Approach for Unsupervised Anomaly Detection in Time Series,” in IEEE Access, vol. 7, pp. 1991-2005, 2019, doi: 10.1109/ACCESS.2018.2886457.
[4] Ouardini, K. et al. (2019). Towards Practical Unsupervised Anomaly Detection on Retinal Images. In: et al. Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data. DART MIL3ID 2019 2019. Lecture Notes in Computer Science(), vol 11795. Springer, Cham. https://doi.org/10.1007/978-3-030-33391-1_26
[5] Campos, G.O., Zimek, A., Sander, J. et al. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30, 891–927 (2016). https://doi.org/10.1007/s10618-015-0444-8
[6] Henrique O. Marques, Ricardo J. G. B. Campello, Jörg Sander, and Arthur Zimek. 2020. Internal Evaluation of Unsupervised Outlier Detection. ACM Trans. Knowl. Discov. Data 14, 4, Article 47 (August 2020), 42 pages. https://doi.org/10.1145/3394053
[7] https://archive.ics.uci.edu/ml/dataset
[8] http://odds.cs.stonybrook.edu/
[9] Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu, A survey of network anomaly detection techniques, Journal of Network and Computer Applications, Volume 60, 2016, Pages 19-31, ISSN 1084-8045, https://doi.org/10.1016/j.jnca.2015.11.016.
[10] Goix, Nicolas. (2016). How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms?.