Matthias Zürl (M.Sc.), Franz Köferl (M.Sc.), Prof. Dr. Björn Eskofier
02 / 2022 – 08 / 2022
Biologists are constantly trying to improve their understanding of animal behavior. Captive animals are of particular interest because such an improved understanding can directly be used to ameliorate their living conditions and their psychological and physical well-being. To gain an improved understanding, one must observe these animals – ideally over long periods of time – to obtain detailed and meaningful insights and statistics. To achieve this, an expert has to record all animal activity of interest manually, either on-site or via video recordings. This is a very timeconsuming process and still does not allow for continuous monitoring. Therefore, automating this task would allow biologists to gather vast amounts of data much more easily and thus conduct large-scale studies that lead to a better understanding of animal insight.
A simple and non-invasive method of monitoring animal behavior is to record it on video and then try to automatically extract all relevant information from the data. Regardless of the investigated behavior, the information is especially valuable if it can be derived for each animal individually. This means that the data extraction process will necessarily involve (a) finding where there are animals in a frame and (b) (re-)identifying each of them. Automating (a) has become a reasonably simple task utilizing powerful object detection algorithms [1, 2]. Re-ID however, is more challenging . When provided large amounts of labeled data for the individuals that are to be identified we have, in effect, a classification problem that is solved fairly easily with the help of modern machine learning tools. In this context previous works  were able to achieve strong results on a number of polar bear data sets from different zoos that were created in-house. But this still requires an upfront human annotation effort for each new enclosure. While preferable to the all-manual approach, an unsupervised algorithm that only needs to be trained on already existing data would obviously be ideal.
Unsupervised Re-ID has been explored fairly extensively for humans  (especially in the context of surveillance) but there exists comparatively little work on animals , be that in the wild or in captivity. The goal is to determine for a given query image (e.g. a recording from a security camera) which person in a set of gallery images (some set of images containing persons of interest) it matches. Generally these approaches try to find a mapping from person-images to a feature space wherein images of the same person, regardless of pose, camera orientation, lighting conditions, etc. are mapped closely together, while images of different people are mapped further apart . If this holds the matching of a query- to a gallery-image is done by embedding all of them in the feature space, and matching them based on which gallery-embedding the query-embedding is closest to. The critical point is the construction of the mapping. Typically, algorithms leverage deep neural networks and large datasets to find one with the desired properties . Crucially these datasets contain a large variety of different individuals allowing the network to generalize these properties by training it to enforce them on the given data. In the case of our polar bear datasets however, the available data (in particular labeled images of different individuals) is limited and subsequently naive extensions of successful supervised models to an unsupervised setting have proven unfruitful . The framing of person Re-ID as a person retrieval process, in the sense of the aforementioned query-gallery matching on the other hand is a harder problem than we actually have to solve. Person retrieval is heavily oriented towards surveillance where we only have a handful of images of each target. In the case of captive animals though, we are dealing with a set collection of individuals that we want to identify over a long period of time. We can therefore expect to collect many more (unlabeled) images which can then be identified collectively. In this sense we are dealing with an unsupervised, closed-world Re-ID problem.
The goals of this thesis therefore are (a) to take an existing unsupervised Re-ID algorithm  that achieves state-of-the-art results in the person domain and apply it to our polar bear datasets and (b) based on the results, test a number of possible improvements that make it more amenable to our concrete use case. Candidates for such improvements include, but are not limited to:
1. Training on data from several zoos simultaneously/Combining several training domains into one to reduce the chance of overfitting. Master Research Proposal Nr: 2030
2. Regularizing the training process by training only a few of the final layers of a pretrained model.
3. Employing clustering algorithms as part of the application methodology to better make use of the many images we have of each individual.
And finally (c) another goal is to examine the model’s few-shot-learning capabilities by incorporating just a small amount of labeled data from the target domain. We consider any significant reduction in annotation work a worthwhile achievement.
 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
 K. He, G. Gkioxari, P. Doll´ar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969.
 M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person reidentification: A survey and outlook,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
 R. Dirauf, “Video-based re-identification of captive polar bears,” Master’s thesis, Friedrich Alexander Universität Erlangen N¨urnberg, 2022.
 X. Lin, P. Ren, C.-H. Yeh, L. Yao, A. Song, and X. Chang, “Unsupervised person reidentification: A systematic survey of challenges and solutions,” arXiv preprint arXiv:2109.06057, 2021.
 S. Li, J. Li, W. Lin, and H. Tang, “Amur tiger re-identification in the wild,” arXiv e-prints, pp. arXiv–1906, 2019.
 Y. Fu, Y.Wei, G.Wang, Y. Zhou, H. Shi, and T. S. Huang, “Self-similarity grouping: A simple unsupervised cross domain adaptation approach for person re-identification,” in proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6112–6121.