Master's Thesis

Interactive segmentation in RGB-D indoor scenes using Deep Learning

Franz Köferl (M.Sc.), Wolfgang Mehringer (M.Sc.), Prof. Dr. Bjön Eskofier

10/2019 – 04/2020


Deep neural networks approximate arbitrary functions by optimizing parameter sets of great sizes.
This exibility allows them to model complex relationships, but also bears the risk of capturing
noise in the training data – known as overtting [1]. To avoid this, in training the parameter count
has to be matched with an appropriately sized set of labeled samples [2].
To this end, the scientic community has created a variety of datasets [3, 4, 5, 6] that can
be used – for example – for pretraining-netuning schemes. Still, data labeling is tedious, as it
requires humans to execute [3] or at least observe the process [4] to ensure a high level of quality.
For this reason, ongoing effort is put into speeding up tasks that require human interaction.
Benenson et al. [3] created 2.5 million instance segmentation masks for RGB images with little
human eort. They conducted a large scale study using their interactive segmentation approach
in which human operators repeatedly corrected the prediction of a network. Their evaluation
showed that it was three times faster compared to conventional methods while producing masks
of better quality. Up until now similar annotation approaches were not applied to geometrically
higher dimensional data, like point clouds or RGB-D data. There are multiple tools [7, 8, 9, 10]
implementing traditional annotation/selection methods (box selection, polygon drawing, etc.).
Those share the inherent diculty of working on 3D data on a 2D screen, which makes using
them time consuming and overall more complex. In general, they are lacking more sophisticated
techniques like seen in recent publications [3, 11, 12] proposing data based approaches in the
domain of RGB image segmentation.
The aim of this master thesis is to apply a data based, interactive segmentation in the point
cloud annotation process by extending the approach by Benenson et al. [3] to 3D geometric data.
For this purpose a point cloud labeling tool will be implemented, which oers the respective
functionality. Based on the tool a study will be carried out comparing the speed and quality of
the results with traditional methods.




  1. K. P. Burnham and D. R. Anderson, Model Selection Bias, in Model Selection and
    Multimodel Inference, 2nd ed., Springer, 1998, pp. 4345.
  2. B. Everitt and A. Skrondal, The Cambridge dictionary of statistics, 4th ed. Cambridge University Press, 2010.
  3. R. Benenson, S. Popov and V. Ferrari, Large-scale interactive object segmentation
    with human annotators, in The IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), 2019, pp. 11 70011 709.
  4. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and F. F. Li, ImageNet: A Large-Scale
    Hierarchical Image Database, in 2009 IEEE Conference on Computer Vision and
    Pattern Recognition, IEEE, 2009, pp. 248255.
  5. T. Hackel, N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler and M. Pollefeys, A new Large-scale Point Cloud Classication Benchmark, in ISPRS
    Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences,
    2017, pp. 9198.
  6.  Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang and J. Xiao, 3D ShapeNets:
    A deep representation for volumetric shapes, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 2015,
    pp. 19121920.
  7.  CloudCompare Community. (2019). CloudCompare (version 2.6), [Online]. Available: (visited on 09/23/2019).
  8.  P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli and G. Ranzuglia,
    MeshLab: An open-source mesh processing tool, in 6th Eurographics Italian Chapter
    Conference 2008 – Proceedings, The Eurographics Association, 2008, pp. 129136.
  9.  Hitachi Automotive And Industry Lab. (2019). Semantic Segmentation Editor, [Online]. Available: https : / / github . com / Hitachi – Automotive – And – Industry –
    Lab/semantic-segmentation-editor (visited on 09/13/2019).
  10. Z. Yan, T. Duckett and N. Bellotto, Online learning for 3D LiDAR-based human
    detection: experimental analysis of point cloud clustering and classication methods,
    Autonomous Robots, pp. 118, 2019.
  11. S. Majumder and A. Yao, Content-Aware Multi-Level Guidance for Interactive Instance Segmentation, in The IEEE Conference on Computer Vision and Pattern
    Recognition (CVPR), 2019, pp. 11 60211 611.
  12. W.-D. Jang and C.-S. Kim, Interactive Image Segmentation via Backpropagating
    Renement Scheme, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 52975306.
  13. A. Kirillov, K. He, R. Girshick, C. Rother and P. Dollár, Panoptic Segmentation, in
    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018,
    pp. 94049413.
  14. C. R. Qi, W. Liu, C. Wu, H. Su and L. J. Guibas, Frustum PointNets for 3D
    Object Detection from RGB-D Data, in Conference on Computer Vision and Pattern
    Recognition (CVPR), 2018, pp. 918927.