2. June 2026

ID 2603: Discovering LLM generated texts through Corpus Linguistics

Research Project

We are offering two Master’s projects (10 ECTs) focused on detecting LLM generated texts through the use of Corpus Linguistics. This project connects linguistic measurements with data science techniques and web scraping. For more details about the theory behind this project, please have a look at our publication https://arxiv.org/pdf/2605.23651.

The student will be responsible for scraping a dataset, apply different models/libraries and analysze the results. Key tasks include:

Web scraping of different web sources.
Application of different models and libraries to extract linguistic features.
Analyzing and detecting interessting patterns in the feature distribution.

Candidate Profile: We do not expect applicants to have complete, pre-existing expertise in linguistics, but we expect some familarity with deep learning and coding.

Ideal candidates will possess:

Experience or interest in coding especially Python and the relevant ML libraries.
A willingness to engage in interdisiplinary research with linguistics and potentially linguistic project partners.
A structured approach to problem-solving and an eagerness to troubleshoot technical challenges.

Supervisors

Björn Nieth, M. Sc.
Researcher & Doctoral Candidate

Please send me an email (bjoern.nieth@fau.de) with your resume and transcripts to apply for the topic. We will then get in contact with you if we are interested.

Last update: 2. June 2026 - 12:14

Research Project

Supervisors

Björn Nieth, M. Sc.Researcher & Doctoral Candidate

Björn Nieth, M. Sc.

Please send me an email (bjoern.nieth@fau.de) with your resume and transcripts to apply for the topic. We will then get in contact with you if we are interested.

Björn Nieth, M. Sc.
Researcher & Doctoral Candidate