Thomas Neher

Thomas Neher

Master's Thesis

On the Robustness of Semi-supervised Learning under adverse Data Distributions: The example of Handwriting Classification


 Christoffer Löffler (M.Sc.),  Prof. Dr. B. Eskofier


05/2021 – 11/2021


Semi-supervised learning (SSL) promises to enable the introduction of large amounts of
unlabeled data into the machine learning process, potentially greatly improving model
performance [1][2][3]. This is especially useful, when an otherwise purely supervised training
scheme does not have sufficient labeled data available for the learning task at hand. However,
in practice, performance was shown to degrade unpredictably at times, rather than improve,
upon addition of unlabeled data [4][5]. This is an open research topic that recently gained
renewed attention from the scientific community [5, 15]. Reasons for performance
degradation are thought to be related to adverse data distributions that break SSL
assumptions like smoothness [6], or show shifts between labeled and unlabeled data [4][7].
One application, that may benefit substantially from using new, unlabeled data for training, is
handwriting recognition. However, in the realistic OnHW [15] time series dataset, adverse data
distributions inhibit the use of SSL. Labeled data may come from only a limited number of
writers, thus introducing sample selection bias relative to a larger unlabeled distribution, or
unlabeled data may be out-of-distribution due to environment effects like a minor rotation of
the pen [7]. Furthermore, literature is highly inconsistent in describing such adverse scenarios,
complicating the ideation of solutions. Terms that are used include dataset shift [7], covariate
shift, concept drift [8], and class distribution mismatch [5], amongst others. Problematic data
can often be viewed from the perspective of related areas such as domain adaptation [9],
transfer learning [10] or out-of-distribution detection [11]. The exact relation to these areas,
however, remains unclear. Hence, the lack of unified terminology on the conditions and
reasons, that cause performance degradation in SSL, impedes progress, and applying SSL
in practice becomes a game of chance.
This thesis focusses primarily on the ideation of an improved state-of-the-art SSL-method for
the adverse OnHW time series dataset for handwriting recognition. A minor contribution is the
identification of problematic SSL cases from literature, creating a unified terminology as well
as a taxonomy of adverse distributions and algorithmic solutions.
In the first phase, a taxonomy and summary of possibly transferable solutions for SSL
methods under adverse data is created. Problematic scenarios are each assessed from the
perspective of related fields in order to establish clear relations. For each of the problems,
methods proposed in the literature to counteract them are presented. The selection of these
methods is based on research interest (number of citations) and relevance to the OnHW
In the second phase, that uses the OnHW dataset, the most severe problems for real-life
application are studied. For in-depth investigation of adverse data distributions, the base
problematic cases identified for OnHW are subsequently simulated and analyzed using simple
artificial datasets such as 2D Gaussians or the CIFAR-10/100 data sets predominant in the
literature [12]. Performance degradation is examined using a state-of-the-art SSL method
(e.g. MixMatch [13]) from an existing code base [14]. Consequently, a suitable method
countering adverse data distributions, such as MixMood [16], Uncertainty-Aware Self-
Distillation [5] or methods based on OOD-detection [17], is implemented and evaluated on
both synthetic an real data. The mitigation of degradation will be analyzed. As a supervised
baseline, a CNN architecture will be used that shows best performance in the OnHW letter
classification task.



[1] J. E. van Engelen and H. H. Hoos, “A survey on semi-supervised learning,” Mach. Learn., vol. 109, no. 2, pp. 373–440,
2020, doi: 10.1007/s10994-019-05855-6.
[2] Chapelle, O., Scholkopf, B., & Zien, A. (2009). Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book
reviews]. IEEE Transactions on Neural Networks, 20(3), 542-542.
[3] P. Ren et al., “A survey of deep active learning,” arXiv, 2020.
[4] A. Oliver, A. Odena, C. Raffel, E. D. Cubuk, and I. J. Goodfellow, “Realistic evaluation of deep semi-supervised learning
algorithms,” Proc. 32nd Int. Conf. Neural Inf. Process. Syst., pp. 3239–3250, 2018, doi: 10.5555/3327144.3327244.
[5] Y. Chen, X. Zhu, W. Li, and S. Gong, “Semi-Supervised Learning under Class Distribution Mismatch,” Proc. 34thAAAI
Conf. Artif. Intell., vol. 34, no. 4, pp. 3569–3576, 2020, doi: 10.1609/aaai.v34i04.5763.
[6] A. Mey and M. Loog, “Improvability through semi-supervised learning: a survey of theoretical results,” arXiv, 2019.
[7] J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodríguez, N. V. Chawla, and F. Herrera, “A unifying view on dataset shift in
classification,” Pattern Recognit., vol. 45, no. 1, pp. 521–530, 2012, doi: 10.1016/j.patcog.2011.06.019.
[8] G. I. Webb, R. Hyde, H. Cao, H. L. Nguyen, and F. Petitjean, “Characterizing concept drift,” Data Min. Knowl. Discov.,
vol. 30, no. 4, pp. 964–994, 2016, doi: 10.1007/s10618-015-0448-4.
[9] J. Jiang and C. X. Zhai, “Instance weighting for domain adaptation in NLP,” ACL 2007 – Proc. 45th Annu. Meet. Assoc.
Comput. Linguist., pp. 264–271, 2007.
[10] H. Y. Zhou, A. Oliver, J. Wu, and Y. Zheng, “When semi-supervised learning meets transfer learning: Training
strategies, models and datasets,” arXiv, 2018.
[11] X. Zhao, K. Krishnateja, R. Iyer, and F. Chen, “Robust semi-supervised learning with out of distribution data,” arXiv,
[12] Krizhevsky, A., & Hinton, G. Learning multiple layers of features from tiny images. 2009
[13] D. Berthelot, N. Carlini, I. Goodfellow, A. Oliver, N. Papernot, and C. Raffel, “MixMatch: A holistic approach to semisupervised
learning,” Advances in Neural Information Processing Systems, 2019.
[14] J. Goschenhofer, R. Hvingelby, D. Rügamer, J. Thomas, M. Wagner, and B. Bischl, “Deep Semi-Supervised Learning for
Time Series Classification,” arXiv, 2020.
[15] F. Ott, M. Wehbi, T. Hamann, J. Barth, B. Eskofier, and C. Mutschler, “The OnHWDataset: Online Handwriting
Recognition from IMU-Enhanced Ballpoint Pens with Machine Learning,” Proc. ACM Interactive, Mobile, Wearable Ubiquitous
Technol., vol. 4, no. 3, 2020, doi: 10.1145/3411842.
[16] S. Calderon-Ramirez, L. Oala, J. Torrents-Barrena, S. Yang, A. Moemeni, W. Samek, and M.A. Molina-Cabello,
“MixMOOD: A systematic approach to class distribution mismatch in semi-supervised learning using deep dataset dissimilarity
measure”, arXiv, 2020.
[17] S. Liang, Y. Li, and R. Srikant, “Enhancing the Reliability of Out-of-distribution Image Detection in Neural Networks,”
International Conference on Learning Representations (ICLR), 2018