Alexander Tarek Aly

Alexander Tarek Aly

Master's Thesis

Data Analysis and Process Mining opportunities with Git and GitLab log data in agile university projects

Philipp Dumbach (M.Sc.), Prof. Dr. Björn Eskofier

10/2019 – 05/2020

Since the introduction of the Agile Manifesto in 2001 several frameworks for agile software development have emerged, like Scrum, one of the most used agile process models nowadays [1]. Agile methods are widely used in the software industry worldwide, which makes it reasonable to embed them in university courses to prepare the students for future requirements. There are many examples where universities used Scrum within software engineering courses or in projects cooperating with real industry partners. [2] [3] A similar project Innovation Lab for Wearable and Ubiquitous Computing is taught at the Friedrich-Alexander-University Erlangen-Nürnberg where an adopted version of Scrum is used to develop innovative prototypes in cooperation with industry and clinic partners. Within the course students use the distributed version-control system Git and the web-based application GitLab, a tool designed for managing Git-repositories and offering features supporting the Scrum process. These tools store log data about students’ activities during the development process. Some studies already focused on the analysis of such data, for example to identify contribution of team members or understand how students develop software [4] [5], whereas only a few studies discussed how this information can be used to improve the development process [6].

One approach to improve and better understand development processes is the design of process models with the collected information from the used software systems. This approach is called process mining and can help to analyze, optimize and better understand software processes. [7] As mentioned in [1], there is a huge interest in adapting Scrum appropriately to the development environment to increase the productivity. Therefore, applying data analysis and process mining techniques to software processes can provide insightful information to identify aspects that can help tailoring the Scrum process model in order to be more productive.

The goal of this thesis is to evaluate the advantages of using Git and GitLab and how the generated data can be used for identifying problems and working patterns during the development to adapt the Scrum process and to support students to be more productive in the future. Further the opportunities of applying data analysis and software process mining techniques within university projects should be evaluated. Consequently, at the beginning it will be identified what information is tracked by GitLab. In a next step the data from the last two years innovation-lab-project repositories will be extracted and prepared for analysis using GitLab API. Based on the prepared data an exploratory analysis will be performed to identify patterns and correlations within the different student-teams way of working to investigate the impact on the development process. Based on the findings 3-4 process improvement hypothesis should be answered and suggestions for the Scrum process tailoring be derived.


  1. Ashraf, S, Aftab, S (2017): Latest Transformations in Scrum: A State of the Art Review. International Journal of Modern Education and Computer Science, 9(7):12–22.
  2. Abdul, A, Bass, JM, Ghavimi, H, Adam, P (2017): Product innovation with scrum: A longitudinal case study. In: , International Conference on Information Society (i-Society 2017). IEEE, Piscataway, NJ.
  3. Mahnic, V (2015): Scrum in software engineering courses: An outline of the literature. Global Journal of Engineering Education, 17:77–83.
  4. Cortes Rios, JC, Kopec-Harding, K, Eraslan, S, Page, C, Haines, R, Jay, C, Embury, SM (2019): A Methodology for Using GitLab for Software Engineering Learning Analytics. In: , 2019 IEEE/ACM 12th International Workshop on Cooperative and Human Aspects of Software Engineering. CHASE 2019 : proceedings : 27 May 2019, Montréal, QC, Canada. IEEE Computer Society, Conference Publishing Services, Los Alamitos, CA.
  5. Parizi, RM, Spoletini, P, Singh, A (10/3/2018 – 10/6/2018): Measuring Team Members’ Contributions in Software Engineering Projects using Git-driven Technology. In: , 2018 IEEE Frontiers in Education Conference (FIE). IEEE.
  6. Mittal, M, Sureka, A (2014): Process mining software repositories from student projects in an undergraduate software engineering course. In: Jalote, P, Briand, L, van der Hoek, A, Briand, LC (Hrsg), 36th International Conference on Software Engineering (ICSE Companion 2014). Proceedings : May 31-June 7, 2014, Hyderabad, India. Association for Computing Machinery, Inc, New York, NY.
  7. Rubin, V, Günther, CW, van der Aalst, WMP, Kindler, E, van Dongen, BF, Schäfer, W (2007): Process Mining Framework for Software Processes. In: Wang, Q, Pfahl, D (Hrsg), Software process dynamics and agility. International Conference on Software Process, ICSP 2007, Minneapolis, MN, USA, May 19 – 20, 2007 ; proceedings. Springer, Berlin.