Dorsaf Jdidi

Master's Thesis

Unsupervised Disentanglement for Post-Identification of GNSS Interference in the Wild

Advisors
Dr. Dario Zanca, Dr.-Ing. Christopher Mutschler, Prof. Dr. Björn Eskofier, Dr.-Ing. Tobias Feigl

Duration
11 / 2022 – 05 / 2023

Abstract

Interference signals affect the processing chain of the global navigation satellite system (GNSS) and degrade its localization accuracy or even prevent localization completely. Therefore, potential interference signals or potential intentional jammers must be eliminated. However, to successfully eliminate them and ensure localization, they must first be detected and then located. This is where classifying the waveform of a jammer signal helps to interpret the purpose of the signal and to identify unknown jammer categories. It is known from the state-of-the-art that supervised learning enables optimal jammer classification under laboratory conditions [2]. But for a practical application, supervised learning inherently requires reference labels for each potential jammer and its specific operating environment to enable classification. In the real world, however, there are environmental influences such as multipath propagation that interfere and corrupt signals, and there are unknown (new) types of jammers and signal patterns. Therefore, state-of-the-art supervised methods are not applicable in real application environments as they lack of reference information and suffer from enormous, sometimes illegal, data collection effort. Hence, it is unknown how a GNSS analysis system can reliably and accurately detect and classify interference when the application environment, speed of movement, jamming signal distance and power level, antenna placement and multipath propagation between GNSS satellite and receiver or between jammer and receiver, vary non-deterministically or are even completely unknown, i.e., no reference label exists.

The GNSS community employs classic threshold-based methods to detect unknown jammers in an quasi-unsupervised way [1]. However, threshold-based detection does not allow for the classification, identification, or localization of a potential jammer. And in the worst case, threshold-based methods do not detect a jammer if its signal is close to the noise floor, i.e., below the threshold. To compensate for these weaknesses, modern supervised methods learn to detect and classify jammers [2, 3]. However, one cannot generate realistic reference data representing all possible real world waveforms of all possible categories of jammers under all effects in the wild [2, 3]. Therefore, there is no guarantee that supervised methods will detect all potential jammers.
Hence, there is a need for a model that adapt to site-specific effects and variations in signals or new jammers over time. In other research areas, e.g., image processing, they address similar challenges with image disentanglement and classification thereof. Their most promising manifold learning technique [4] are so-called embedding methods. These consist of unsupervised dimensionality reduction methods that learn low-dimensional embeddings from data that preserve distinctive neighborhood properties of the original space. Well-known architectures that are used for this are Variational Autoencoder (VAE) [5], Siamese Neural Network (SNN) [6], and Triplet Neural Network (TNN) [7]. In contrast to VAE, SNNs and TNNs provide representative embeddings even when there are many categories and few reference instances per category [8]. The literature shows that performing post-clustering analysis on the resulting learned representations instead of raw data points significantly improves classification accuracy [9, 10]. However, detailed studies on the extraction of characteristic features and their clustering based on SNN and TNN [9] have not yet been published within the GNSS community for interference detection and classification.
It is also completely unknown how (unsupervised) manifold-learning may detect and classify interference in real world applications. As part of this qualification thesis, it will therefore be investigated for the first time how robust and accurate unsupervised learning is for variable and non-deterministic real-world artifacts for anomaly detection and classification.

The key idea of this work is that the novel framework provides a more robust and accurate analysis of interference in GNNS signals than traditional supervised approaches (that require reference labels), as it exploits new measurements of unknown interference signals and real environments in a quasi-unsupervised manner (without reference labels) and enables their subsequent interpretation for detection and classification. So, the effort required to obtain training data can be lowered and additional information that is unknown or changes over time can be employed in the analysis. As the new framework works with low-dimensional representations of information, it also potentially reduces computational and energy costs.
The student will therefore research and develop a novel framework for unsupervised GNSS interference analysis. State-of-the-art unsupervised and semi-supervised methods may identify discriminatory patterns and group data samples into multiple groups based on their similarity.
For this purpose the student will embed complex high-dimensional input signals (e.g., 7 different sources of interference and realistic variants) in a low-dimensional space using different state-of-the-art learning algorithms (e.g., VAE, SNN, TNN). The idea is that Manifold Learning [4] may GNSS raw data in an unsupervised way. Therefore the goal is to investigate how the latent space can be divided into different groups in which similar samples are spatially close to each other.
The student will adapt various state-of-the-art learning architectures and examine and evaluate them w.r.t. their applicability, accuracy, and robustness to detect and classify GNNS jammer categories without supervision. For this purpose, the student will explore the applicability of different embedding algorithms, e.g., Beta-VAE [5] and categorical-VAE [11] with different priors (standard or mixture of Gaussian [12]), SNN [6], TNN [7], as well as online pair/triplet mining [7] and variants thereof. Instead of using the generative aspects of the methods, the student will focus on analyzing the spatial separation of the encoded data in latent space. The student will also investigate the effects of different loss functions, e.g., contrastive [9], triplet [7], Evidence Lower Bound (ELBO) [5]. S/he will also evaluate the effect of various distance metrics (both intra- and inter-distances) in latent space, e.g., Euclidean, Mahalanobis [13], and cosine.
The student will examine an (optional) processing step (e.g., t-SNE, UMAP) that may further reduce the dimension of the learned representations. S/he will also investigate clustering techniques [14], e.g., K-means, to generate the appropriate reference labels, e.g., the V-measure for the number of clusters or the normalised Mutual Information score, of the jammers to learned representations and to identify a potential jammer.
Finally, the student will develop methods (e.g., RF [15]) and CNN [3]) for detecting and classifying inferences based on the learned (low-dimensional) representations. In addition, this profitable additional information will be fed back into the processing chain to continuously optimize the detection and classification accuracy. The downstream classification step should therefore consist of three phases: clustering, distance-thresholding, and classification-based labeling of new samples and, if feasible, an update of cluster members and their corresponding centroids after successful mapping to ensure continuity of the monitoring task.
To demonstrate the feasibility, the student will investigate the proposed processing chain in a large-scale measurement campaign: with at least 6 sources of interference, e.g., chirp, noise, pulsed, multi-tone, modulated, and frequency hopping, no interference data and various real-world effects, e.g., different movement speeds, multipath effects, signal power levels, distances between jammer and receiver and generalization to unknown measurements from the same or different receivers. The quality of the new processing chain will be based on its usability for downstream tasks, e.g., classification. For this purpose, pre-trained classifiers will be evaluated using reference samples. Appropriate evaluation metrics will be reported, e.g., V-Measure [14] and Fβ=2 [16], to quantify the dissimilarity between samples. Optionally, the variance or uncertainty of ensemble predictions will be reported.

References
[1] Vennarini, Alessia and Coluccia, Andrea and Gerbeth, Daniel and Crespillo, Omar Garcia and Neri, Alessandro: Detection of GNSS Interference in Safety Critical Railway Applications using Commercial Receivers. Proc. Intl. Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+), 1476–1489, 2020.
[2] van der Merwe, Johannes Rossouw and Franco, David Contreras and Jdidi, Dorsaf and Feigl, Tobias and Rügamer, Alexander and Felber, Wolfgang: Low-cost COTS GNSS interference detection and classification platform: Initial results. Proc. Intl. Conf. on Localization and GNSS (ICL-GNSS), 1-8, 2022.
[3] Morales Ferre, Ruben and de la Fuente, Alberto and Lohan, Elena Simona: classification in GNSS bands via machine learning algorithms. Sensors Jo., 4841, 2019.
[4] Zhang, Junping and Li, Stan Z and Wang, Jue: Manifold learning and applications in recognition.Proc. Intl. Conf. on Intelligent multimedia processing with soft computing, 281–300, 2005.
[5] Kingma, Diederik P and Welling, Max: Auto-encoding variational bayes. arXiv:1312.6114 [stat.ML], 2013.
[6] Bromley, Jane and Guyon, Isabelle and LeCun, Yann and Säckinger, Eduard and Shah, Roopak: Signature verification using aßiamese”time delay neural network.Advances in neural information processing systems (NIPS), 6, 1993.
[7] Schroff, Florian and Kalenichenko, Dmitry and Philbin, James : Facenet: A unified embedding for face recognition and clustering.Proc. Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), 815–823, 2015.
[8] Koch, Gregory and Zemel, Richard and Salakhutdinov, Ruslan et al. : Siamese neural networks for one-shot image recognition.Proc. Intl. Conf. on Machine Learning (ICML), 2,2015.
[9] Hsu, Yen-Chang and Kira, Zsolt: Neural network-based clustering using pairwise constraints. arXiv:1511.06321, 2015.
[10] Jiang, Zhuxi and Zheng, Yin and Tan, Huachun and Tang, Bangsheng and Zhou, Hanning: Variational deep embedding: An unsupervised and generative approach to clustering. arXiv:1611.05148, 2016.
[11] Jang, Eric and Gu, Shixiang and Poole, Ben: Categorical reparameterization with gumbelsoftmax. arXiv:1611.01144, 2016.
[12] Dilokthanakul, Nat and Mediano, Pedro AM and Garnelo, Marta and Lee, Matthew CH and Salimbeni, Hugh and Arulkumaran, Kai and Shanahan, Murray: Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv:1611.02648 [stat. ML], 2016.
[13] Mahalanobis, Prasanta Chandra: On the generalized distance in statistics. Proc. Intl. Conf. on National Institute of Science of India, 1936.
[14] Norlander, Erik and Sopasakis, Alexandros: Latent space conditioning for improved classification and anomaly detection. arXiv:1911.10599 [stat. ML], 2019.
[15] Breiman, Leo: Random forests. Machine learning Jo., 45, 5-32, 2001.
[16] Chinchor, Nancy and Sundheim, Beth M: MUC-5 evaluation metrics. Proc. Conf. Message Understanding Conference (MUC), 1993.