Student projects for supervision by Hayo Thielecke

Generally, I am interested in supervising projects that are about programming languages (C/C++, OCaml, Agda, Scala, Rust, even Javascript...), such as analyzing or translating programs. If you have your own project idea related to the above, please feel free to discuss it.

Usage patterns of templates in real-world C++ code

Traditional object-oriented design patterns emphasize inheritance. However, in modern C++ the importance of parametrization using templates has increased; see for example the Standard Template Library. This topic will investigate high-level program design using C++11 templates, with emphasis on type safety and correctness. For example, both the LLVM compiling framework and games engines make extensive use of C++ templates, so they could used as a code base for investigation.

Safe systems-level programming in Rust

C is still the language of choice for programming operating systems and other code that needs to be efficient and manage system resources. However, C code is often afflicted with buffer overflows and other memory corruption vulnerabilities. Some new programming languages (such as Rust) aim to give the same power as C while being much safer (eliminating buffer overflows, for instance).

Porting RXXR2 from Ocaml to C++

Out tool RXXR2 automatically finds REDoS vulnerabilities in regular expressions. The target audience and intended users for this tool are systems security people, whose canonical language tends to be C rather than functional languages. The aim of this project is to re-implement RXXR2 in C++ using the current Ocaml implementation as a guideline. It is a good opportunity for students who have taken the C/C++ module to become more fluent by writing a sizeable program in C++.

Gamification in teaching computer science

Browser games (or mobile apps) that help students learn computer science concepts and techniques. There are many core CS topics that could be used as a basis for games: regular expressions, grammars, C syntax, pointers, ...

Browser game in Javascript about planetary exploration

This is a game like Spaceplan, but with more realistic physics. There is a sun, planets orbit the sun and moons orbit the planets. There are resources, such as solar cells or Helium-3 on the planets and moons for nuclear fusion. You navigate a spacecraft among the planets. The more resources you have gained, the more energy you have; the more energy you have available, the more resources you can gain, such as fuel, more powerful engines, or planetary probes that gather materials. >/p>

Traceabily enhancements for LLVM

(Suggested by Asiri at ARM.) Modern compilers perform such aggressive optimisations on programs that render the correspondence between the input sources and the final binaries quite un-obvious for human understanding. Yet, certain sensitive application domains (such as aviation) demand that final binaries be manually inspected to ensure that they are semantically equivalent to the input sources. Traditionally, debug information allows one to establish binary-to-source correspondence, but this technique cannot be relied upon at higher optimisation levels, where the most advanced transformations make it extremely difficult to maintain a useful debug view (e.g. code-motion). Moreover, the resource-constrained nature of the concerned application domains makes it infeasible to compile programs at lower optimisation levels.
The aim of this project is to develop tools/techniques to aid in the manual verification of programs compiled at higher optimisation levels. A good starting point is the --print-after=all option provided by clang that dumps the LLVM IR output for each of the optimisation passes involved. This is a rudimentary debug routine which is neither user-friendly nor complete enough to be deployed in a production environment, but one can imagine ways to build on this option.

Contification in LLVM and CPS

Contification is a compiler optimization that turns some function calls into jumps. It has been studied in CPS compilers for functional languages, but LLVM does at least some of it for compiling C, generating some impressively tight machine code. The aim of this project is to compare what LLVM does to contification expressed in CPS, and see if there is scope for LLVM to do more optimization than it currently does.

LLVM memory layout visualizer

Add code to the LLVM compiler that does the following: for each function, draw the layout of its stack frames; and/or for each C struct or C++ object, draw the how its members are laid out in memory. The information is contained in the LLVM internal representation; the output should be in LaTeX or a format suitable for a drawing package.

Adding unicode support to RXXR2

Adding unicode to RXXR2 would be useful. See here for some of the issues.

Integration of RXXR2 into an IDE

The project involves the integration of the static analyser RXXR2 into a popular IDE, such as Eclipse for Java. The aim is that programmers should receive automatic warnings about potential REDoS vulnerabilities. When using the IDE, the user's code should be automatically searched for uses of regular expressions by running RXXR2 in the background. For example, certain arguments to API calls in the Java regular expression libraries should be extracted and fed to RXXR2.

Formalizing correctness properties of RXXR2 in Agda

Fast parallel regular expression matching on GPUs

In much of the literature you will see the claim that GPUs can match regular expression in log time. This seems impossible at first, but there is some ingenious theory that makes it work in principle, namely Blelloch’s prefix sum operation. I would like to see some experiments to gauge how practical such fast parallel matching actually is. The project consists of implementing the lexer sketched in Steele and Hillis’s paper in CUDA or OpenGL. Potential applications may be in areas such as biology rather than CS, e.g. finding patterns in DNA.

Recursive-ascent parsing

Top-down, or recursive descent, parsers can be written in a pleasantly direct style as a collection of mutually recursive methods in Java. By contrast, bottom-up parsers (like those generated by yacc/bison) are typically table-driven. Recursive ascent generates methods for a bottom-up parser. The project would explore writing such parsers in C/C++, OCaml or Haskell.

Eliminating left recursion and parsing actions

Some popular parser generators such as ANTLR will not work on grammars with left recursion. There is a standard technqiue for eliminating left recursion from grammars. However, a parser in ANTLR also contains semantic actions in the form of Java or C code. These actions also need to be transformed. Doing so is the aim of this project, by programming an appropriate pre-processor for ANTLR. It should be based on the L-calculus in my 2014 paper “On the Semantics of Parsing Actions” in Science of Computer Programming (see my publications page).