ID 2411: Investigation of Robustness of Single-View 3D Pose Estimators under Challenging Conditions

Symbolic picture for the article. The link opens the image in a large view.

Master’s Thesis

3D human pose estimation is a crucial aspect of computer vision with applications ranging from guided movement exercises in the home environment to supportive systems for athletes and coaches. In recent years, pose estimation that incorporates depth information, e.g. projecting humans in threedimensional space advanced extensively. Presented methods are triangulation, uplifting from 2D to 3D space, or direct prediction of spatiotemporal information of human joints. An outstanding improvement of deep learning methods led to highly stable and accurate models that only require a single camera view.

However, the robustness of existing estimators under challenging conditions, such as difficult camera angles, varying lighting conditions, and the presence of occlusions, remains a significant concern in both 2D and 3D approaches. Numerous advances, such as dual-stream spatiotemporal transformer models, modulated graph convolutional networks (GCNs), or combinational networks of transformers and GCNs were introduced to enhance the performance and robustness of single-view 3D pose estimations.

This thesis proposes to investigate and enhance the robustness of single-view 3D pose estimators by evaluating their performance under different conditions and testing novel approaches to improve the stability of pose-predictions in a temporally coherent context. Prediction pipeline design choices, generalization of different data augmentation techniques to in the wild data and issues of overfitting in existing pipelines are tested in detail.

Tasks

  • Investigate state-of-the-art single-view 3D pose estimation approaches.
  • Evaluate the effectiveness and accuracy of existing estimators under challenging conditions including difficult camera angles, varying lighting conditions and image sizes, while additionally specifically focusing on the depth dimension.
    • Compare the performance of uplifting approaches versus direct estimators.
    • Investigate the effect of cropping on the robustness and stability of pose estimators by evaluating performance with and without cropping.
  • Introduce an approach to enhance the stability and robustness of single-view 3D pose estimators over time.

Requirements

  • Experience in computer vision
  • Familiarity with frameworks like TensorFlow, Keras, and PyTorch.

Supervisors

Please use the application form to apply for the topic. We will then get in contact with you.