Roman Hucke

Bachelor's Thesis

Analysis of real-time person detection performance on edge devices utilizing state-of-the-art detection methods

Advisors
Franz Köferl (M.Sc.), René Raab (M.Sc.), Prof. Dr. Björn Eskofier

Duration
06 / 2022 – 11 / 2022

Abstract
The detection of human beings plays a crucial role for an abundance of application areas including person identification, abnormal event detection, fall detection for elderly people and many more [1]. Edge devices like accelerator-based single-board computers (SBCs) form an increasingly popular and viable way to implement such detection tasks [2]. This is not only due to their low power consumption and high performance [2], but also because with the rise of the Internet of Things (IoT), more and more data is appearing at the edge of a network, making processing tasks at the network edge appealing [3]. While a highly accurate and fast detection is possible with state-of-the-art methods and near unlimited resources, it commonly remains a challenge to reach an acceptable performance under limiting circumstances [3]. In consequence, utilizing the best-performing methods, with no regard to efficiency or resource-costs would be inappropriate, as we are limited by the given computation power, energy-, and time consumption [3].

Approaching this challenge is a decisive part of the common project of the Friedrich-Alexander-Universität Erlangen-N¨urnberg (FAU), DATEV and the Deutsche Museum Nuremberg. The overall aim is to construct a technology-themed exhibition that creates profound visitor profiles by gathering their respective information including age, sex, body measures, as well as interests based on visitor focus or observable emotional state, etc. This is achieved by utilizing real time tracking of visitors via camera-based person detection on NVIDIA Jetson hardware.
Using a NVIDIA Jetson Xavier NX board integrated in the Boxer8251AI BOX PC by AAEON Technology Inc. and a DFK 37BUX178 USB 3.1 color industrial camera by The Imaging Source Europe GmbH, the performance, regarding detection rate and inference speed, of the currently deployed system remains unsuitable for the task at hand. This raises a number of questions on how to improve the system’s overall performance. Firstly, an evaluation is required on whether the simulated results of related research [2] are transferable to our target application. Additionally, methods that aim to improve the performance in specific target domains, e.g., Transfer Learning [4], Domain Adaptation [5], etc., should be taken into consideration to eventually boost detection rate. Lastly, we will consider suitable ways to adjust and optimize the system by implementing methods that aim to increase the speed of the detection process by exploiting GPU parallelization [6], optimizing the tracking process [7] and further improving existing processes [8].

The goal of this thesis is to investigate the performance of the ”You Only Look Once”network (YOLO) [8], a widespread object detection model [2], in regards to real-time person detection and tracking. The analysis will be confined in a specific unchanging environment with a number of fixed cameras, recording various subjects through different angles, who traverse said environment in a consistent way. We then evaluate the system and its potential performance, according to related works and their results, and subsequently implement and test the aforementioned methods aiming to further improve detection rate and inference speed. In order to ensure reliable results throughout the process of this analysis, a customized study, that matches the prevalent conditions, will accompany the process.

References

[1] M. Paul, M. E. Haque, and S. Chakraborty, “Human detection in surveillance videos and its applications-a review,” p. 176, 2013. [Online]. Available: http://asp.eurasipjournals.com/content/2013/1/176
[2] H. Feng, G. Mu, S. Zhong, P. Zhang, and T. Yuan, “Benchmark Analysis of YOLO Performance on Edge Intelligence Devices,” Cryptography, vol. 6, p. 16, 4 2022. [Online]. Available: https://www.mdpi.com/2410-387X/6/2/16
[3] Z. Huang, S. Yang, M. C. Zhou, Z. Gong, A. Abusorrah, C. Lin, and Z. Huang, “Making accurate object detection at the edge: review and new approach,” Artificial Intelligence Review, vol. 55, pp. 2245–2274, 3 2022.
[4] K. Weiss, T. M. Khoshgoftaar, and D. D. Wang, “A survey of transfer learning,” Journal of Big Data, vol. 3, 12 2016.
[5] G. Csurka, “Domain adaptation for visual applications: A comprehensive survey,” 2 2017. [Online]. Available: http://arxiv.org/abs/1702.05374
[6] T. Fukagai, K. Maeda, S. Tanabe, K. Shirahata, Y. Tomita, A. Ike, and A. Nakagawa, “Speedup of object detection neural network with gpu.” IEEE Computer Society, 8 2018, pp. 301–305.
[7] E. Bochinski, V. Eiselein, and T. Sikora, “High-speed tracking-by-detection without using image information,” 2017, pp. 1–6.
[8] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” 4 2020. [Online]. Available: http://arxiv.org/abs/2004.10934
[9] B. G. Han, J. G. Lee, K. T. Lim, and D. H. Choi, “Design of a scalable and fast yolo for edge-computing devices,” Sensors (Switzerland), vol. 20, pp. 1–15, 12 2020.