Jan Boden

Master's Thesis

Feasibility Analysis on player skill assessment using Pose Estimation and Temporal Event Segmentation in Football

Advisors

Rebecca Lennartz (M.Sc.), Jitin Jami (M.Sc.), Prof. Dr. Björn Eskofier

Duration

03 / 2025 – 09/ 2025

Abstract

In football, competition for top athletes catalyzes the adoption of Artificial Intelligence (AI) through data-driven talent acquisition. For example, AI is used to support talent scouts [1] or recognize promising individuals [2]. On-field data often lack repeatability under standardized conditions, while human observations are prone to bias and inconsistencies. The Igloo 360-degree environment introduces a controlled and reproducible way of testing player skill by analyzing movement or reaction times. Consequently, standardized measurements of player skill level, real-time feedback, and recommendations for improvement may follow. Eventually, this setup enables fairer access to development opportunities based on performance. For this purpose, suitable algorithms, such as pose estimation and temporal event segmentation, must be evaluated in the Igloo environment. However, the overhead fisheye camera and the low-light environment challenge the performance levels of established algorithms.

Low-light enhancement techniques, such as histogram equalization [3] [4] [5] or zero-shot methods like ZeroDCE++ [6], are used to improve object separability. Approaches for 2-D pose estimation utilize CNNs [7] or Transformers [8] and can be extended to infer 3-D keypoints [9]. MMPose [10] offers a comprehensive suite of tools for pose estimation. Together with synthetic datasets like THEODORE+ [11], pose estimation methods are customizable to specific use cases. Temporal action segmentation extracts action sequences of distinct events via CNNs and Transformers [12] [13] [14] [15]. In the Igloo context, these events may be ball retention or shots on goal.

By fusing the methods above, this study seeks to answer two questions. First, can meaningful and interpretable metrics, such as frequency of ball interactions, variability of joint speeds, or angular deviations during specific actions, be reliably extracted? And second, do these data correlate meaningfully with player skill levels? Overall, the aim of this research is to explore the feasibility of AI-based player skill level assessment combining pose estimation and temporal action segmentation.

References

[1] Geraint Evans. Discovering the next generation of soccer talent through ai and the amateur game. https://www.forbes.com/sites/drgeraintevans/2021/01/31/ discovering-the-next-generation-of-soccer-talent-through-ai-and-the-amateur-game/, 2021. Accessed: 2025-04-12.

[2] Jack Bantock. Top soccer clubs are using an ai-powered app to scout future stars. https://edition.cnn.com/2024/03/01/tech/aiscout-app-soccer-scouting-spc-intl/index.html, 2024. Accessed: 2025-04-12.

[3] Yeong-Taeg Kim. Contrast enhancement using brightness preserving bi-histogram equalization. IEEE transactions on Consumer Electronics, 43(1):1 8, 1997.

[4] Yu Wang, Qian Chen, and Baeomin Zhang. Image enhancement based on equal area dualistic sub-image histogram equalization method. IEEE transactions on Consumer Electronics, 45(1):68-75, 1999.

[5] Soong-Der Chen and Abd Rahman Ramli. Minimum mean brightness error bi-histogram equalization in contrast enhancement. IEEE transactions on Consumer Electronics, 49(4):1310-1319, 2003.

[6] Chongyi Li, Chunle Guo, and Chen Change Loy. Learning to enhance low-light image via zero-reference deep curve estimation. IEEE transactions on pattern analysis and machine intelligence, 44(8):4225-4238, 2021.

[7] Yanjie Li, Shoukui Zhang, Zhicheng Wang, Sen Yang, Wankou Yang, Shu-Tao Xia, and Erjin Zhou. Tokenpose: Learning keypoint tokens for human pose estimation. In Proceedings of the IEEE/CVF International conference on computer vision, pages 11313 11322, 2021.

[8] Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao. Vitpose: Simple vision transformer baselines for human pose estimation. Advances in neural information processing systems, 35:38571-38584, 2022.

[9] Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7753-7762, 2019.

[10] MMPose Contributors. Openmmlab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose, 2020.

[11] Jingrui Yu, Tobias Scheck, Roman Seidel, Yukti Adya, Dipankar Nandi, and Gangolf Hirtz. Human pose estimation in monocular omnidirectional top-view images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 6410-6419, June 2023.

[12] Hui Zan and Gang Zhao. Human action recognition research based on fusion ts-cnn and lstm networks. Arabian Journal for Science and Engineering, 48(2):2331-2345, 2023.

[13] Khaled Bayoudh, Fayçal Hamdaoui, and Abdellatif Mtibaa. An attention-based hybrid 2d/3d cnn-lstm for human action recognition. In 2022 2nd international conference on computing and information technology (ICCIT), pages 97-103. IEEE, 2022.

[14] Wenhui Li, Weizhi Nie, and Yuting Su. Human action recognition based on selected spatiotemporal features via bidirectional lstm. IEEE Access, 6:44211 44220, 2018.

[15] Min Yang, Huan Gao, Ping Guo, and Limin Wang. Adapting short-term transformers for action detection in untrimmed videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18570-18579, 2024.