Johannes Jablonski

Johannes Jablonski

Bachelor's Thesis

Application of data and process analysis techniques for the evaluation of agile university projects

Philipp Dumbach (M.Sc.), An Ngyuen (M.Sc.), Prof. Dr. Björn Eskofier


SCRUM is the most widely used method in agile software development. Although it is nearly 20 years in use, SCRUM surely reached its most users in the last years [1, 2]. The frequency of use alone makes it interesting for universities to teach students agile methods in software evelopment. The Machine Learning and Data Analytics Lab (MaD Lab) at the Friedrich-Alexander University Erlangen-Nürnberg (FAU) works with a slightly modified version of SCRUM in their student courses [3]. The work was tracked via Git, a version control system, and GitLab, a web-based application to visualize and manage the overall development process using Git data. The students documented their process using the GitLab Wiki.

This thesis is based on the insights from Alexander Aly’s master thesis Data Analysis and Process Mining opportunities with Git and GitLab log data in agile university projects [4]. He developed a tool called GitLab Analyser that visualizes the students’ activities over time using Git, GitLab and its Wiki. The tool is meant to help project supervisors to detect possible problems in the development process. Aly’s thesis revealed a total of 15 potential indicators which can be used to determine the project grades. These indicators were set by its mean or median. Furthermore, the GitLab Analyser can extract event logs of the developing process which can be analyzed in detail in a Process Mining (PM) Tool like Disco or Celonis [4, 5]. Based on previous findings, this bachelor thesis focuses on the following three research questions.

The first goal is to verify the selection of the indicators by follow-up experiments and by checking the indicators using the GitLab Analyser with additional project data from FAU and partner universities. In Aly’s thesis, there was less data available which made it difficult to create quantitative evidence. This additional dataset will also be used to check the similarity of the SCRUM processes used in the three university courses. Another goal of this thesis is to evaluate whether the chosen features are best fitting. From literature we can learn that methods for feature subset selection and correlation help to determine an optimal feature set. For feature subset selection, filter and wrapper methods can be applied. Feature Subset Selection also reveals the impact of a feature or a group of features on the analysis result [6]. The Pearson correlation method can be used to detect dependencies between features. With such a technique relationships between the indicators can be found out mathematically, and it is possible to weight each indicator individually. Also, features which are not relevant for the final grade or a good workflow are detected [6]. The impact of an indicator on the final result will be examined in this part of the thesis.

This thesis will be completed with a process mining analysis of the SCRUM workflow. Process flows of good-graded groups will be compared with those with lower grades. So the ’optimal’ workflow can be derived and can be offered to new student groups as a kind of recommendation working on their project [7]. This will be done by using a PM tool like Disco or Celonis. The required data for the PM tool can be exported from the GitLab Analyser. With the help of these results, it will be answered how a ’good’ workflow looks like and also how it differs from the theoretical SCRUM process. The PM analyzing process will be well documented to provide an easy guide for the supervisors, so that they can quickly find out bottlenecks and other workflow problems for a single group.

[1] Simschek, R. and Kaiser, F.: SCRUM: Das Erfolgsphänomen einfach erklärt. UVK Verlagsgesellschaft, 15–16, 2019.
[2] Schwaber, K.: Agiles Projektmanagement mit Scrum. Microsoft Press, 2007
[3], 26.06.2020
[4] Aly, A.: Data Analysis and Process Mining opportunities with Git and GitLab log data in agile university projects. Master’s Thesis at Machine Learning & Data Analytics department of Friedrich-Alexander-Univerity Erlangen-Nürnberg, 95–100, 2020
[5] Dumbach P., Aly A., Zrenner M., Eskofier B.: Exploration of Process Mining Opportunities In Educational Software Engineering – The GitLab Analyser. 13th International Conference on Educational Data Mining (Ifrane, Morocco (Fully Virtual Conference), 10. July 2020 – 13. July 2020). In: Anna N. Rafferty, Jacob Whitehill, Cristobal Romero, Violetta Cavalli-Sforza (ed.): Proceedings of the 13th International Conference on Educational Data Mining 2020.
[6] Guyon, I. and Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182, 2003
[7] van der Aalst, W.: Process Mining. Springer-Verlag, 43–44, 2016