Chair of Programming Languages and AI
print


Breadcrumb Navigation


Content

BSc/MSc Theses

This is a (non-exclusive) list of thesis topics currently offered at PLAI. Additionally, other related topics may be available.

Please contact the mentioned person with a transcript of record and highlight all relevant experience related to the topic you are interested in for a BSc or MSc thesis. Also, please take note of the languages the person speaks.

Open

  • MSc
    Malware detection in the CWS using Code Similarity

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    Code similarity measures how closely two pieces of source code resemble each other in structure, syntax, or behavior. It can be analyzed using various techniques, including syntactic (token-based, tree-based), semantic (execution-based, embedding models), and hybrid approaches. Applications include detecting plagiarism, identifying code clones, and clustering similar malware samples.

    Your Part Use recently detected malicious browser extensions to find further malware samples. We want a fully automatic approach that could be applied iteratively on this ecosystem.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work Did I Vet You Before? Assessing the Chrome Web Store Vetting Process through Browser Extension Similarity

  • MSc
    Building a Cross-Architecture Testsuite for ARM32 binary patches
  • MSc
    Studying cases of Affiliate Fraud in the CWS - Honey

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    Honey is a popular browser extension that automatically applies coupon codes at checkout to help users save money while shopping online. Affiliate fraud occurs when an entity manipulates an affiliate marketing program to illegitimately earn commissions, often by hijacking or injecting affiliate links without user consent. Honey was accused of engaging in affiliate fraud by automatically replacing or injecting its own affiliate links when users made purchases, allowing it to earn commissions while misleading users about the nature of its cashback and discount services.

    Your Part Reverse engineer the source code of Honey and search for related cases in the CWS. We want to detect semantically similar browser extensions in this ecosystem.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work Exposing the Honey Influencer Scam

  • MSc
    Detecting clones in the CWS - Analysis of Imitation Attacks

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The Chrome Web Store contains many extensions that mimic popular ones in name, appearance, and functionality, tricking users into installing them by mistake. These clone extensions often introduce security risks, such as injecting ads, stealing user data, or executing malicious code. The lack of rigorous vetting and automated detection mechanisms allows attackers to exploit user trust and distribute harmful software.

    Your Part Analyze extensions in the Chrome Web Store to identify clones of popular extensions. Develop an approach to detect imitation attempts based on code similarity, metadata, and/or behavior. The solution should scale to continuously monitor and flag potential security risks.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work What is in the Chrome Web Store? Investigating Security-Noteworthy Browser Extensions

  • MSc
    Vulnerability Detection in IoT-Firmware binaries
  • BSc
    Replicate results from research done for CWS - Hardening the Security Analysis

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The paper Hardening the Security Analysis of Browser Extensions conducts a systematic study of attack entry points within the browser extension ecosystem, identifying both known and novel vulnerabilities such as password theft, traffic interception, and inter-extension attacks. By analyzing the interactions between extensions, browsers, and web applications, the authors propose a comprehensive approach to enhance security analysis, combining static and dynamic methods to detect insecure extensions and, in some cases, synthesize attack payloads. Their evaluation, which involved downloading and examining 133,365 extensions from the Chrome Web Store, underscores the necessity for a more robust threat model to effectively mitigate these security risks.

    Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to validate if their claims still hold in this ecosystem.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work Hardening the Security Analysis of Browser Extensions

  • BSc
    Replicate results from research done for CWS - No Signal Left to Chance

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The paper No Signal Left to Chance: Driving Browser Extension Analysis by Download Patterns investigates the use of download patterns as a signal for analyzing browser extensions. By leveraging machine learning to cluster extensions based on their download behaviors, the study identifies groups of extensions with similar patterns, some of which are associated with malicious activity. The authors demonstrate that analyzing these patterns can effectively detect malicious extensions, enhancing security measures in browser ecosystems.

    Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to validate if their claims still hold in this ecosystem.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work No Signal Left to Chance: Driving Browser Extension Analysis by Download Patterns

  • MSc
    ML-based Transpilation from C to Rust

    Application: Please write an email describing your relevant experiences and attach a CV, you transcript of records (incl. bachelor grades), and a writing sample

    Motivation

    • Rust is a modern programming language with advantages in memory-safety. However re-writing existing code bases is too expensive for certain companies.
    • We are researching ML-based methods to support automatic translation ("transpilation")

    Your Part

    • Based on Facebook's TransCoder Model implementation, re-train the model for transpiling code snippets from C to Rust, e.g. using datasets like Rosetta-Code
    • Evaluate its performance based on the size of the input C code and context

    Prerequisites

    • Basic knowledge in transformer architectures and training
    • Basic knowledge of C and Rust semantics
  • BSc/MSc
    Automatic Modularization of Software Projects for LLM-based Transpilation

    Application: Please write an email describing your relevant experiences and attach a CV, you transcript of records (incl. bachelor grades), and a writing sample

    Motivation

    • With the help of LLMs more tools are developed for transpiling software from old programming languages to new languages, e.g. C to Rust transpilation
    • However, most of the LLM-based research prototypes only operate on Code Snippets, because their context size is too small

    Your Part

    • Given an arbitrary context size limitation of an LLM ($n$ number of characters), implement an algorithm to split up the code base into small chunks of transpilable code snippets, e.g. by calculating a call graph and identifying strongly connected components between the functions in the program.
    • Challenge: How to split up the code base arbitrarly small chunks but ensure that each chunk is still compilable.

    Prerequisites

    • Basic principles of Software Engineering
    • Experience with the C programming language
    • Lecture "Program Analysis for Security" or "Compiler-Design"
  • BSc
    Finding Safe Rust Replacements for C Libraries

    Application: Please write an email describing your relevant experiences and attach a CV, you transcript of records (incl. bachelor grades), and a writing sample

    Motivation

    • Rust is a modern programming language which shows strong performance with improved security. However re-writing existing code bases is too expensive.
    • To make widespread reuse existing Rust libraries, we want to build an index of Rust libraries from https://crates.io/ can act as replacements for widespread C libraries, e.g. the image library as replacement for libpng
    • This way, all C applications that depend on libpng might be made a little bit safer by swapping out libpng for Rusts "image" library

    Your Part

    • Use a combination of LLMs and deterministic methods to implement a search for potential Rust replacements for a given C library.
    • By iterating this search over multiple popular C libraries, build an index of potential Rust Replacements
    • If possible, not only search for functionally equivalent Rust libraries but also assess other properties that would simplify the developers' work in the replacing process, e.g. potential API compatibility, build system compatibility, risk of including unstable 3rd party dependencies (see: https://tweedegolf.nl/en/blog/119/sudo-rs-depencencies-when-less-is-better)
    • As a case study, take an example C application and one of its libraries with an adequate Rust replacement and evaluate the results for equivalence and performance.
  • BSc/Project
    Disk-Assisted Vulnerability Scanning

    Motivation When designing static vulnerability scanners, the goal is to minimize false positives while still identifying non-trivial data flows and scaling to real-world apps. IFDS-based algorithms have proven to be very useful, but they still require hundreds of gigabytes of RAM to analyze particularly complex apps. To still be able to run the analysis in more resource-constrained settings such as developer workstations, researchers have proposed to swap out less used data to the disk.

    Your Part The research group is notorious for providing unreusable artifacts, i.e., only publishing binary blobs. We want to make an open-source implementation available. If done as a bachelor's thesis, you will additionally empirically evaluate your implementation and think about further improvements.

    Prerequisites You will be implementing the solution on top of a large Java codebase, so proficiency in Java is benefical.

    Related Work Scaling Up the IFDS Algorithm with Efficient Disk-Assisted Computing

  • MSc
    Lifting Cooperative Taint Analysis

    Motivation Taint analysis is a fundamental dataflow problem that can be used to identify security vulnerabilities such as SQL injections, XSS, insecure deserialization and many others. To be useful and not overwhelm the user with too many false positives, the state-of-the-art uses precise heap abstractions such as access graphs or access paths. You can think of an access path as a variable in the code and a chain of field references with a maximum length of k. These heap abstractions are additionally enriched with type information and other metadata.
    One algorithm to solve these dataflow problems is IFDS. Especially the ability to solve two dependent problems in cooperation, i.e. resolving aliases on-demand for the taint analysis, allows IFDS-based taint analysis to discover non-trivial dataflows while still being reasonable precise. However, IFDS has a worst-time complexity of O(|E| |D|^3) with D being the domain and the access path domain can grow quite large, making it impossible to analyze complex applications in a reasonable timeframe.
    IDE is a generalization of IFDS, originally developed to solve map domains such as constant propagation (Var -> Value), and allows to split up a powerset domain into a dataflow domain and a value domain. The time complexity of IDE is the same as for IFDS.
    A recent paper (Oct 2024) shows that it is possible to split up the access path heap model into a dataflow domain of local variables and a value domain of field (de-)references, leading to speedups of average 200x(!) compared to the equivalent IFDS formulation. Besides them basing their work on an artefact from 2016 and not publishing their code, they also consider alias analysis orthogonal to the taint analysis problem and use a previously published alias analysis still operating on an expensive access graph domain to resolve the aliases.

    Your Part First, we want to reproduce the work based on an up-to-date version of Soot and FlowDroid, because we are highly interested in open-sourcing the more scalable analysis. Second, we have the idea of a cooperative alias analysis in IDE. IDE's phase 1 is basically IFDS, so we do think it is possible to exchange dataflow facts in phase 1. Also, FlowDroid's call site matching during the path building stage has already proven that you can solve a context-free language across different analysis directions. Thus, the research question is whether it is possible to extend the CFL grammar aka the value domain to also solve aliases on-demand and asynchronously with multiple IDE solvers as it is done with IFDS. To the best of our knowledge, there isn't yet any paper that uses multiple IDE solvers in cooperation, so this is actually something completly new to work on!

    Prerequisites

    • You will be implementing the solution on top of a large Java codebase, so proficiency in Java is benefical.
    • Furthermore, this topic requires you to design a complex static analysis that is supposed to analyze real-world Java applications, which requires good knowledge about the Java language semantics.
    • Knowledge about static analysis and context-free languages obtained from courses like Principles of Compiler Design or Program Analysis for Security is also beneficial.

    Related Work

In Progress



Last update 25.02.2025