Maximilian Rüthlein

Maximilian Rüthlein

Master's Thesis

Interactive segmentation in RGB-D indoor scenes using Deep Learning

Franz Köferl (M.Sc.), Wolfgang Mehringer (M.Sc.), Prof. Dr. Björn Eskofier

10/2019 – 04/2020

Deep neural networks approximate arbitrary functions by optimizing parameter sets of great sizes. This exibility allows them to model complex relationships, but also bears the risk of capturing noise in the training data – known as overtting [1]. To avoid this, in training the parameter count has to be matched with an appropriately sized set of labeled samples [2]. To this end, the scientic community has created a variety of datasets [3, 4, 5, 6] that can be used – for example – for pretraining-netuning schemes. Still, data labeling is tedious, as it requires humans to execute [3] or at least observe the process [4] to ensure a high level of quality. For this reason, ongoing effort is put into speeding up tasks that require human interaction. Benenson et al. [3] created 2.5 million instance segmentation masks for RGB images with little human eort. They conducted a large scale study using their interactive segmentation approach in which human operators repeatedly corrected the prediction of a network. Their evaluation showed that it was three times faster compared to conventional methods while producing masks of better quality. Up until now similar annotation approaches were not applied to geometrically higher dimensional data, like point clouds or RGB-D data. There are multiple tools [7, 8, 9, 10] implementing traditional annotation/selection methods (box selection, polygon drawing, etc.). Those share the inherent diculty of working on 3D data on a 2D screen, which makes using them time consuming and overall more complex. In general, they are lacking more sophisticated techniques like seen in recent publications [3, 11, 12] proposing data based approaches in the domain of RGB image segmentation.

The aim of this master thesis is to apply a data based, interactive segmentation in the point cloud annotation process by extending the approach by Benenson et al. [3] to 3D geometric data. For this purpose a point cloud labeling tool will be implemented, which oers the respective functionality. Based on the tool a study will be carried out comparing the speed and quality of the results with traditional methods.


  1. K. P. Burnham and D. R. Anderson, Model Selection Bias, in Model Selection and Multimodel Inference, 2nd ed., Springer, 1998, pp. 4345.
  2. B. Everitt and A. Skrondal, The Cambridge dictionary of statistics, 4th ed. Cambridge University Press, 2010.
  3. R. Benenson, S. Popov and V. Ferrari, Large-scale interactive object segmentation with human annotators, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11 70011 709.
  4. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and F. F. Li, ImageNet: A Large-Scale Hierarchical Image Database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 248255.
  5. T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler and M. Pollefeys, A new Large-scale Point Cloud Classication Benchmark, in ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2017, pp. 9198.
  6.  Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao, 3D ShapeNets: A deep representation for volumetric shapes, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 2015, pp. 19121920.
  7.  CloudCompare Community. (2019). CloudCompare (version 2.6), [Online]. Available: (visited on 09/23/2019).
  8.  P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli and G. Ranzuglia, MeshLab: An open-source mesh processing tool, in 6th Eurographics Italian Chapter Conference 2008 – Proceedings, The Eurographics Association, 2008, pp. 129136.
  9.  Hitachi Automotive And Industry Lab. (2019). Semantic Segmentation Editor, [Online]. Available: https : / / github . com / Hitachi – Automotive – And – Industry – Lab/semantic-segmentation-editor (visited on 09/13/2019).
  10. Z. Yan, T. Duckett and N. Bellotto, Online learning for 3D LiDAR-based human detection: experimental analysis of point cloud clustering and classication methods, Autonomous Robots, pp. 118, 2019.
  11. S. Majumder and A. Yao, Content-Aware Multi-Level Guidance for Interactive Instance Segmentation, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 11 60211 611.
  12. W.-D. Jang and C.-S. Kim, Interactive Image Segmentation via Backpropagating Renement Scheme, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 52975306.
  13. A. Kirillov, K. He, R. Girshick, C. Rother and P. Dollár, Panoptic Segmentation, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 94049413.
  14. C. R. Qi, W. Liu, C. Wu, H. Su and L. J. Guibas, Frustum PointNets for 3D Object Detection from RGB-D Data, in Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 918927.