Reserved

Quantifying the Accuracy of LLM-based Literature Surveys

ML & AI Security Master Thesis Kirchberg Campus

Overview

Systematic Literature Reviews (SLR) are an important but time-consuming task. With the proliferation of Large Language Models (LLMs), it is only natural to look at automating this tedious task. But how accurate are LLMs in this process? How accurate even are humans for such a repetitive and tedious task?

In this project, we will quantify the accuracy of LLM-based Literature Survey in the Data Extraction and Paper Classification stages. Therefore, we engineer and optimize a process for LLM-based automation of these steps. We then apply this process to repeat existing studies and manually investigate cases where the human-based and LLM-based classifications differ. For now, we plan to limit our work to the cybersecurity domain, a domain we have profound knowledge of as a group.

Requirements

Strong scripting knowledge in e.g., Python.
Background in cybersecurity
Interest in diving into different subfields of cybersecurity research.
Familiarity with LLMs is a plus.

Expected Outcomes

A process to automate SLRs.
Quantifiable results on how accurate our LLM-based process is compared to human-based SLRs.
Potential for publication in leading AI or cybersecurity conferences

References

Zhuang et al. Large language models for automated scholarly paper review: A survey. Information Fusion (2025): 103332
Brereton et al. “Lessons from applying the systematic literature review process within the software engineering domain”. Journal of systems and software, 80(4), 2007
Petersen et al. “Guidelines for conducting systematic mapping studies in software engineering: An update”. Information and Software Technology 64, 2015
Apruzzese et al. “SoK: Pragmatic assessment of machine learning for network intrusion detection.” In_ IEEE 8th European Symposium on Security and Privacy (EuroS&P), 2023_
Lamberts et al. “SoK: Evaluations in industrial intrusion detection research”. Journal of Systems Research 3(1), 2023

Note: You may be subject to a coding challenge during the application process