Anes Redžepagić

Anes Redžepagić

Bachelor's Thesis

Mixed Reality Data Collection for Industrial ML

Christoffer Löffler (M.Sc.), Prof. Dr. Björn Eskofier

08/2019 – 12/2020

Despite the ongoing automation of modern production processes, manual labor continues to be necessary due to its flexibility and ease of deployment [1]. However, while automated processes assure quality and traceability, manual labor introduces gaps into the quality assurance process [1]. This is unwanted and sometimes intolerable. Fraunhofer IIS investigates cognitive sensors, e.g. for smart screwdrivers, to detect errors in some manual tasks. The idea is to detect errors in the assembly process in order to prevent quality issues by analyzing the data collected by embedded sensors. To achieve high reliability and energy efficiency, an embedded analysis of the data via machine learning algorithms is currently under development. One approach, e.g. for tracking of tightening tools in the industrial assembly, involves monitoring processes using inertial sensors that we attach as add-ones to the tightening tools. However, for training any machine learning based embedded classification algorithm we require large amounts of data from different tasks, tools and environments.

Collecting data for training AI algorithms is generally associated with high costs in both time and labor [4]. While unskilled labor for the data collection and its labeling might seem beneficial, instructing and supervising them might become much more time. One way to guide untrained workers with comprehensible instructions and tutorials is with the assistance of Augmented Reality (AR), e.g. the Microsoft HoloLens mixed reality head mounted display (HMD) [6]. Such training based on AR in general is a safe and rather successful alternative to common training methods in the industry [2]. Large corporations like Japan Airlines, Volvo and others have already been using the HoloLens in particular for this purpose [3]. However, in practice manual labelling of raw sensor data streams without relying on auxiliary data, such as video or positional data, is very error prone [8] or even impossible. This is partly due to artifacts that are already introduced by workers handling the tools not as usual or by insufficiencies in the auxiliary data. Thus, the data needs to be labeled automatically, in a way that clearly shows which action it represents and in which way, e.g. position or timing, this action has been performed.

Goal of this thesis By using modern augmented reality headsets such as the Microsoft HoloLens, we can instruct and supervise the data collection process. The instructions can be given as 3D overlays to the current work piece, so that untrained workers can perform new tasks without additional instructions or supervision. The 3D tracking capabilities will be extended to track the position and orientation of the tool that is used to perform the current task. Additionally, we will record video to enable better understanding of the automatically labeled data for Data Scientists.

The thesis consists of the implementation and evaluation of three modules:

  1. A visualization of instructions on the HMD that is sufficient to guide workers through the steps of a manual production process. The visualization must be detailed enough to allow a segmentation of the tasks into their composing steps, so that automatic data labeling can be applied. For this task the game engine Unity 3D should be used. Positions of objects relative to coded markers, e.g. QR codes, should be calculated via Vuforia [7]. Since the focus does not lie on the training of workers but on collecting labeled data for machine learning, the evaluation of this module does not require qualitative UI/UX studies of learning effects or similar, but evaluate the performance of the automated data collection. Because the sensor data will be labeled based on the visualized step, this module should be evaluated by the generated labels’ time synchronicity with ground-truth (video- and tracking-based) labels.
  2. Position tracking. The position of the hand-held tool, which is used during the task, is tracked relative to the worker using HoloLens. Since the HoloLens can recognize only two types of hand gestures and no objects with any reliability [5], a marker-based approach should be implemented. This should be built on top of the Vuforia framework that is sufficient for this purpose. An evaluation should line out significant errors between the real and the calculated positions (e.g. if the processing time leads to delays). Hence, the focus in this work will be on the accuracy of the positional tracking and the synchronization of the sensor data and labels, for which video recording and an external positional tracking system, e.g. ART, Nexonar or QualiSys, will be used to gather ground truth data.
  3. The positional tracking data of the HoloLens and the embedded sensor’s data, i.e., generated by our existing sensor platform, needs to be recorded and temporally synchronized or aligned. The recorded data must be labeled with the visualized/instructed steps, and exported for use in a common format, e.g. json, csv or for the tool Nova.


  1. Boothroyd, G. (2005). Assembly Automation and Product Design Second Edition. London, UK: Taylor & Francis Group.
  2. Werrlich, S., Eichstetter, E., Nitsche, K., & Notni, G. (2017). An Overview of Evaluations Using Augmented Reality for Assembly Training Tasks. International Journal of Computer and Inform ation Engineering, 11(10). [3] Carey, S. (2018, May 16). How Microsoft HoloLens is being used in the real world. Retrieved from
  3. Roh, Y., Heo, G., & Whang, S. E. (2018). A Survey on Data Collection for Machine Learning: a Big Data – AI Integration Perspective. Unpublished manuscript, School of Electrical Engineering, KAIST, Daejeon, Korea.
  4. Chen J. Y. C. & Fragomeni G. (Eds.). (2018). Virtual, Augmented and Mixed Reality: Interaction, Navigation, Visualization, Embodiment, and Simulation. Cham, Switzerland: Springer International Publishing AG
  5. Sailer, C. (2018, June 20.). Mixed Reality im Einsatz: data experts trainiert kluge Köpfe. Retrieved from
  6. Evans, G., Miller, J., Pena, M. I., MacAllister, A. & Winer, E. (2017). Evaluating the Microsoft HoloLens through an augmented reality assembly application. Proc SPIE.
  7. Yan, R., Yang, J., Hauptmann, A. (2003). Automatically Labeling Video Data Using Multi-class Active Learning. In: Proceedings of the ninth IEEE international conference on computer vision (ICCV 2003) 2-Volume Set 0-7695-1950-4/03