Bachelor's Thesis

Application of data and process analysis techniques for the evaluation of agile university projects


Philipp Dumbach (M.Sc.), An Ngyuen (M.Sc.), Prof. Dr. B. Eskofier




SCRUM is the most widely used method in agile software development. Although it is nearly 20
years in use, SCRUM surely reached its most users in the last years [1, 2]. The frequency of use
alone makes it interesting for universities to teach students agile methods in software development.
The Machine Learning and Data Analytics Lab (MaD Lab) at the Friedrich-Alexander University
Erlangen-Nürnberg (FAU) works with a slightly modified version of SCRUM in their student
courses [3]. The work was tracked via Git, a version control system, and GitLab, a web-based
application to visualize and manage the overall development process using Git data. The students
documented their process using the GitLab Wiki.
This thesis is based on the insights from Alexander Aly’s master thesis Data Analysis and
Process Mining opportunities with Git and GitLab log data in agile university projects [4]. He
developed a tool called GitLab Analyser that visualizes the students’ activities over time using Git,
GitLab and its Wiki. The tool is meant to help project supervisors to detect possible problems in
the development process. Aly’s thesis revealed a total of 15 potential indicators which can be used
to determine the project grades. These indicators were set by its mean or median. Furthermore,
the GitLab Analyser can extract event logs of the developing process which can be analyzed in
detail in a Process Mining (PM) Tool like Disco or Celonis [4, 5]. Based on previous findings, this
bachelor thesis focuses on the following three research questions.
The first goal is to verify the selection of the indicators by follow-up experiments and by
checking the indicators using the GitLab Analyser with additional project data from FAU and
partner universities. In Aly’s thesis, there was less data available which made it difficult to create
quantitative evidence. This additional dataset will also be used to check the similarity of the
SCRUM processes used in the three university courses.
Another goal of this thesis is to evaluate whether the chosen features are best fitting. From
literature we can learn that methods for feature subset selection and correlation help to determine
an optimal feature set. For feature subset selection, filter and wrapper methods can be applied.
Feature Subset Selection also reveals the impact of a feature or a group of features on the analysis
result [6]. The Pearson correlation method can be used to detect dependencies between features.
With such a technique relationships between the indicators can be found out mathematically, and
it is possible to weight each indicator individually. Also, features which are not relevant for the
final grade or a good workflow are detected [6]. The impact of an indicator on the final result will
be examined in this part of the thesis.
This thesis will be completed with a process mining analysis of the SCRUM workflow. Process
flows of good-graded groups will be compared with those with lower grades. So the ’optimal’
workflow can be derived and can be offered to new student groups as a kind of recommendation
working on their project [7]. This will be done by using a PM tool like Disco or Celonis. The
required data for the PM tool can be exported from the GitLab Analyser. With the help of these
results, it will be answered how a ’good’ workflow looks like and also how it differs from the
theoretical SCRUM process. The PM analyzing process will be well documented to provide an
easy guide for the supervisors, so that they can quickly find out bottlenecks and other workflow
problems for a single group.



[1] Simschek, R. and Kaiser, F.: SCRUM: Das Erfolgsphänomen einfach erklärt. UVK Verlagsgesellschaft,
15–16, 2019.
[2] Schwaber, K.: Agiles Projektmanagement mit Scrum. Microsoft Press, 2007
[3], 26.06.2020
[4] Aly, A.: Data Analysis and Process Mining opportunities with Git and GitLab log data in
agile university projects. Master’s Thesis at Machine Learning & Data Analytics department
of Friedrich-Alexander-Univerity Erlangen-Nürnberg, 95–100, 2020
[5] Dumbach P., Aly A., Zrenner M., Eskofier B.: Exploration of Process Mining Opportunities
In Educational Software Engineering – The GitLab Analyser. 13th International Conference
on Educational Data Mining (Ifrane, Morocco (Fully Virtual Conference), 10. July 2020 –
13. July 2020). In: Anna N. Rafferty, Jacob Whitehill, Cristobal Romero, Violetta Cavalli-
Sforza (ed.): Proceedings of the 13th International Conference on Educational Data Mining
[6] Guyon, I. and Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of
Machine Learning Research 3, 1157–1182, 2003
[7] van der Aalst, W.: Process Mining. Springer-Verlag, 43–44, 2016