Immersive Gesture-Based Human Robot Interaction in MR
11 / 2020 – 06 / 2021
Human Computer Interfaces (HCI) are an essential part of the digital life and are presented
in many different forms. While traditional 2D interfaces are still the norm there are now many
different interfaces based around the reality-virtuality continuum presented by Milgram et al. in
1994 [Mil+94]. An currently researched extension of these interfaces are immersive, headmounted
systems. In contrast to traditional virtual reality (VR) interfaces that rely on controllers or
mobile applications with touch interfaces, mixed reality (MR) based implementations usually
rely on camera systems that track hand or finger positions. Newer improvements extend these
with depth information from time-of-flight(TOF) or stereoscopic sensors, increasing detection
stability and precision of distance measurements. Combined with simultaneous localization and
mapping (SLAM) algorithms and visual markers this enables interfaces to overlay digital images
over their physical counterparts precisely for mixed reality applications [Bor+09].
Mixed reality systems have applications in many different fields due to their unique advantage
to extend the real world with digital information and enable hands-free usage due to the ability
to process inputs without needing a physical input device [AAM+13; Alk17]. Their handsfree
interaction and head-mounted displays migrate the need to switch between tools, manuals
and controllers. These generic gesture-based input styles for immersive interactions and their
recognition are already well understood [BKM09; Kol+06] and have already proven themselves
in numerous implementations [Car+11; OCN06]. Nevertheless they are still prone to being new,
novel approaches with very specific usages and a lack of shared guidelines [BH09]. According to
Lee et al. there is not only a need for generic usability studies but especially regarding interfaces
for augmented reality systems, as traditional methodologies for knowledge transfer cannot be
applied directly to MR-based interfaces [Lee+13].
Research about intuitive usage of these MR-based gesture interfaces becomes even more important
when used as spacial robot controllers, because they offer the ability to reason about
3D environments without a lossy reduction to 2D displays. They are not restricted to displaying
information, but also need to encompass control of actuators in wildly variying environments.
The usage of human robot interfaces (HRI) is an effective addition to classical robotic systems
[Gre+07; Wil+19], especially in industrial environments. Their usage as interactive controllers
in contrast to fixed programs enable flexible production lines that keep pace with the upcoming
trend for smaller batch sizes [GTK17].
While earlier implementations have been notable for their time [SS00; SJG13; MS13], modern
advances in computer vision, wearables and sensors allow for more immersive and less restrictive
solutions. Modern mixed reality headsets offer more flexibility and better immersion while
being easier to use, giving developers the possibility to improve on old solutions in many ways.
MR-based applications give way to improved forms of control by embedding interfaces into real
contexts and offering more tightly integrated feedback to the actions of the user. Some forms of
input to control robot movements from MR were researched by D. Puljiz et al. using a Hololens
1, where they called out a need for a MR-based control of tool heads and implemented a simple
interface [Pul+19a]. They noted the technical limitations of inbuild gestures of their device and
their difficulties tracking hands in close proximity to the robot and the need to validate this kind
of interface with user studies. An additional paper published by them later described a newer
implementation that was able to control tool heads and other joints directly, but still lacked a
user study and had only limited modes of interaction due to their continued use of the Hololens
The objective of this thesis is to offer insights into MR-based gesture control of robots by
developing a reference implementation based on modern hardware. It is based on a physical
robot that can be controlled and instructed from a MR headset with gesture-based interaction
principles to execute a variety of physical tasks while offering mixed reality overlays to enable the
usage of natural gestures with immediate feedback. A user study will then validate the usability
assumptions and allow for the derivation of design implications by training and testing usage
with participants that have no prior experience with similar systems. A mix of quantitative data
measured and a qualitative experience report and interviews are then used as the basis for the
subsequent analysis. The results generated enable future implementations to use data-driven
design for immersive interactions.
[AAM+13] Abrar Omar Alkhamisi, Saudi Arabia, Muhammad Mostafa Monowar, et al. “Rise
of augmented reality: Current and future application areas”. In: International journal
of internet and distributed systems 1.04 (2013), p. 25.
[Alk17] Mona Alkhattabi. “Augmented reality as e-learning tool in primary schools’ education:
Barriers to teachers’ adoption”. In: International Journal of Emerging
Technologies in Learning (iJET) 12.02 (2017), pp. 91–100.
[BH09] Wolfgang Broll and Jan Herling. “Supporting Reusability of VR and AR Interface
Elements and Interaction Techniques”. In: Virtual and Mixed Reality. Ed. by Randall
Shumaker. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 145–153.
[BKM09] Mark Billinghurst, Hirokazu Kato, and Seiko Myojin. “Advanced interaction techniques
for augmented reality applications”. In: International Conference on Virtual
and Mixed Reality. Springer. 2009, pp. 13–22.
[Bor+09] Monica Bordegoni et al. “Evaluation of a Haptic-Based Interaction System for
Virtual Manual Assembly”. In: vol. 5622. July 2009, pp. 303–312.
[Car+11] Julie Carmigniani et al. “Augmented reality technologies, systems and applications”.
In: Multimedia tools and applications 51.1 (2011), pp. 341–377.
[Gre+07] Scott Green et al. “Human Robot Collaboration: An Augmented Reality Approach
A Literature Review And Analysis”. In: (Jan. 2007).
[GTK17] J. Guhl, S. Tung, and J. Kruger. “Concept and architecture for programming industrial
robots using augmented reality with mobile devices like microsoft HoloLens”.
In: 2017 22nd IEEE International Conference on Emerging Technologies and Factory
Automation (ETFA). 2017, pp. 1–4.
[Kol+06] Mathias Kolsch et al. “Multimodal interaction with a wearable augmented reality
system”. In: IEEE Computer Graphics and Applications 26.3 (2006), pp. 62–71.
[Lee+13] Minkyung Lee et al. “A usability study of multimodal input in an augmented reality
environment”. In: Virtual Reality 17.4 (2013), pp. 293–305.
[Mil+94] Paul Milgram et al. “Augmented reality: A class of displays on the reality-virtuality
continuum”. In: Telemanipulator and Telepresence Technologies 2351 (Jan. 1994).
[MS13] S. Moe and I. Schjølberg. “Real-time hand guiding of industrial manipulator in 5
DOF using Microsoft Kinect and accelerometer”. In: 2013 IEEE RO-MAN. 2013,
[OCN06] S. K. Ong, J. W. S. Chong, and A. Y. C. Nee. “Methodologies for Immersive Robot
Programming in an Augmented Reality Environment”. In: Proceedings of the 4th
International Conference on Computer Graphics and Interactive Techniques in Australasia
and Southeast Asia. GRAPHITE ’06. Kuala Lumpur, Malaysia: Association
for Computing Machinery, 2006, pp. 237–244. isbn: 1595935649.
[Pul+19a] D. Puljiz et al. “Sensorless Hand Guidance Using Microsoft Hololens”. In: 2019
14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).
2019, pp. 632–633.
[Pul+19b] David Puljiz et al. “General Hand Guidance Framework using Microsoft HoloLens”.
In: arXiv preprint arXiv:1908.04692 (2019).
[SJG13] J. Shen, J. Jin, and N. Gans. “A Multi-view camera-projector system for object
detection and robot-human feedback”. In: 2013 IEEE International Conference on
Robotics and Automation. 2013, pp. 3382–3388.
[SS00] S. Sato and S. Sakane. “A human-robot interface using an interactive hand pointer
that projects a mark in the real work space”. In: Proceedings 2000 ICRA. Millennium
Conference. IEEE International Conference on Robotics and Automation.
Symposia Proceedings (Cat. No.00CH37065). Vol. 1. 2000, 589–595 vol.1.
[Wil+19] T. Williams et al. “Mixed Reality Deictic Gesture for Multi-Modal Robot Communication”.
In: 2019 14th ACM/IEEE International Conference on Human-Robot
Interaction (HRI). 2019, pp. 191–201.