Simon Dietz

Simon Dietz

Master's Thesis

Machine Learning Methods for Mixed-Type Time Series Analysis


An Nguyen (M.Sc.), Thomas Altstidl (M.Sc), Dr. Dario Zanca, Prof. Dr. Björn Eskofier


07/2021 – 01/2022


Data fusion of multiple modalities is commonly done somewhere between the source (data-level) and target (decision-level). However, for mixed-type time series consisting of both, sequences of categorical events [1] and real-valued continuous signals this proves difficult due to the incompatibility of the data sources. This thesis explores and benchmarks methods to overcome this inherent incompatibility, on both synthetic and real-world datasets.

Reliable methods for processing mixed-type time series are desirable since combining information from multiple modalities originating from the same underlying process can lead to improved robustness of models [2]. Furthermore, multimodal machine learning can utilize potential complementary information within data sources, unavailable to unimodal approaches [3]. Finally, multimodal systems can be more robust towards noisy inputs and compensate for missing modalities by relying on others [4].
Prior research already proposed methods for fusing both modalities. Approaches range from data-level fusion [5] to feature-level fusion [6, 7] and decision-level fusion [8]. The following approaches are proposed to build on existing methods:

  • To process irregular time series, standard machine learning methods can be adjusted to capture temporal information [9]. Further exploring time-aware methods might prove beneficial.
  •  Some multimodal approaches like [10] require time-series inputs to be of equal or constant length. To bypass this limitation, sequences in mixed-type scenarios are often processed individually. Fusing intermediate temporal representations of both modalities might aid in capturing inter-modality correlations.
  • Embedded sequences are often concatenated. Exploring more elaborate fusion methods like multi-level fusion [11] or canonical-correlation analysis (CCA) based fusion methods [12, 13] might lead to better results.
  • A versatile data-fusion method could solve many of the problems motioned above. Prepossessing methods like adaptive segmentation [14] and symbolic aggregate approximation [15] can bring both data types closer together, potentially simplifying the fusion process.

The lack of publicly available mixed-type time-series datasets often leads to the use of synthetic datasets like in [16]. For the generation of event sequences, temporal point processes like the Hawkes process are useful tools.


[1] N. Du, et al.: Recurrent Marked Temporal Point Processes: Embedding Event History to Vector. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1555-1564, 2010.
[2] G. Pomianos, et al.: Recent advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE 91.9, 1306-1326, 2003.
[3] T. Baltruaitis, et al.: Multimodal Machine Learning: A Survey and Taxonomy.IEEE Transactions on Pattern Analysis and Machine Intelligence 41.2, 423-443, 2017.
[4] X. Yang, et al.: Deep Multimodal Representation Learning From Temporal Data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1, 5447-5455, 2017.
[5] H.P. Martínez, et al.: Deep Multimodal Fusion: Combining Discrete Events and Continuous Signals. Proceedings of the 16th International Conference on Multimodal Interaction, 34-41, 2014.
[6] S. Xiao, et al.: Modeling the Intensity Function of Point Process Via Recurrent Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence 31, 2374-3468, 2017.
[7] S. Xiao, et al.: Learning Time Series Associated Event Sequences With Recurrent Point Process Networks. IEEE Transactions on Neural Networks and Learning Systems 30.10, 3124-3136, 2019.
[8] B. Cao, et al.: DeepMood: Modeling Mobile Phone Typing Dynamics for Mood Detection. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 747-755, 2017.
[9] I. M. Baytas, et al.: Patient Subtyping via Time-Aware LSTM Networks. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 6574, 2017.
[10] S. Du, et al.: A Hybrid Method for Traffic Flow Forecasting Using Multimodal Deep Learning. International Journal of Computational Intelligence Systems 13.1, 85 – 97, 2019.
[11] V. Sindagi, et al.: Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting. Proceedings of the IEEE/CVF International Conference on Computer Vision, 1002-1012, 2019.
[12] Q. Sun, et al.: Feature fusion method based on canonical correlation analysis and handwritten character recognition. 8th Control, Automation, Robotics and Vision Conference, 1547-1552, 2004.
[13] W. Zuobin, et al.: Feature Regrouping for CCA – Based Feature Fusion and Extraction Through Normalized Cut. 21st International Conference on Information Fusion, 2275-2282, 2018.
[14] L. Liu, et al.: Learning Hierarchical Representations of Electronic Health Records for Clinical Outcome Prediction. AMIA Annual Symposium Proceedings 2019, 597-606, 2019.
[15] J. Zhao, et al.: Learning from heterogeneous temporal data in electronic health records. Journal of Biomedical Informatics 65, 105-119, 2017.
[16] L. Feremans, et al.: Pattern-Based Anomaly Detection in Mixed-Type Time Series. Machine Learning and Knowledge Discovery in Databases ECML PKDD, 240-256, 2019.