Static analysis for regular expression denial-of-servive attacks
RXXR is a static analysis tool that detects whether regular
expressions are vulnerable to redos attacks, which can cause Java to
hang. This project extends the analysis to cover a broader class of
regular expressions than currently supported by the tool and to make
it more user-friendly. Porting RXXR to Scala will produce a stand-alone application in Java bytecode, making the analysis easier to use for Java developers.
For more details, see
the RXXR page.
This project is suitable for: MSc Computer Security, undergraduate CS, MSc Advanced Computer Science
Latex to ibook translator
NB: while this would be a very useful tool to build, I have not looked into its feasibility, involving a propriety format. Academics often write lecture notes in latex, which produces great PDFs for printing or viewing on a desktop. But at the moment there is no good tool to translate whole Latex documents to an electronic book format suitable for handheld devices.
ANTLR and left recursion
This project builds a program transformation of ANTLR programs to make
them suitable for parsing. The theoretical basis is my paper in
PPDP'12 (see my publications page).
This project is suitable
for: undergraduate CS, MSc Advanced Computer Science
Regular expression matching and parsing
Regular expressions (and their extensions) are everywhere, yet their implementation in Java is not anywhere as efficient as it could be. Projects in this area will leverage the large number of cores on a GPU (Graphics Processor) to match regular expressions in parallel. These techniques may also be extensible to parsing, such as XML parsing.
Programming in ML or Scala
ML is an advanced programming languages, and there are several dialects of ML with efficient implementations, such as OCAML and F Sharp. I am interested in projects that use ML. Note that many of the features of ML-style languages are covered in the final-year optional module on Principles of Programming Languages.
Push versus pull parsing
In the context of XML parsing in Java, there are two contrasting approaches to the interaction between the parser and the rest of the Java program: push technology uses method callbacks, whereas pull uses streams. The aim of this project is to analyse this distinction in a wider computer science context (there may be connections to continuations and streams in functional programming, or the Visitor Pattern in Object-oriented design).
Software engineering and web design
Web pages have links between them. That is analogous to data structures with pointers. They tend to be updated by different people, often with disregard for incoming links or consistency. That is analogous to concurrent processes updating data structures with pointers, and the problem of maintaining data structure invariants. So the question here is: can concurrent programming shed some light on how to design a web site and how it evolves?
Recursive-ascent parsing
Top-down, or recursive descent, parsers can be written in a pleasantly direct style as a collection of mutually recursive methods in Java. By contrast, bottom-up parsers (like those generated by yacc/bison) are typically table-driven. Recursive ascent generates methods for a bottom-up parser. The project would explore writing such parsers in Java.
Visitor-oriented programming
The visitor pattern is an object-oriented design pattern for traversing data structures, such as the parse trees described in the parsing part of the SSC1 module.
Aspect-oriented programming
Aspect weaving can potentially change the behaviour of the original code beyond what may have been intended. How can reasoning about methods in terms of pre- and postconditions be adapted to aspect-oriented programming?
Trusted Platform Module versus return-oriented programming
Does the TPM help against buffer overflow attacks? At first sight, it may appear obvious that it should prevent attacks that inject malicious code. However, advanced buffer overflow attacks re-use existing code "gadgets" to construct a malicious interpeter for arbitrary code. Hence, the TPM may not be as effective in this regard.
Javascript security, particularly object capabilities
The Java security model has become much more complex than the initial sandbox design. At the same time, users face risks from malicious Javascript code. Recent research has proposed object-capability models for Javascript; see also Google Caja. Some related questions include: How does the Javascript security architecture compare to Java's? How does the process isolation in Google's Chrome browser impact Javascript security?
Google Native Client
The project is a case study about using Google Native Client.
App security
How secure are apps in Android or the iPhone? Specifically, how effective are the sandboxing techniques used, and could code transformation or analysis be used to enhance security?
Safer programming languages as an alternative to C
Some modern programming languages are designed to be suitable for writing code that would traditionally be written in C, but without the security problems that are widespread in C code. A project could be based on a case study, such as porting C code to a safer language. Some previous projects have successfully done so for Cyclone. Another language that would be interesting in this regard is the Go programming language under development at Google.
Security of anti-virus software
Anti-virus software is marketed aggressively, despite the fact that it is highly invasive (comparable to root kits) and so poses significant security risks itself. The recently published buffer overflow in Clam AV is a case in point. This project aims to assess the security risk (and other consequences) of AV software, ideally by auditing some open-sourcve AV (as it may be unfeasible to find information on commercial products, although using a debugger is a possibility). It is not necessarily expected that an actual exploit is found, but that the risk is put in perspective.
Using Java annotations for security
This project aims to use annotations in Java code to document good secure coding practice and to enable automatic tools to check this.
Economics of software security
One explanation of why computer security is so bad involves economic factors. Some of the relevant buzzwords from microeconomics are incentive incompatibility and information asymmetry. For instance, the cost of insecure code is often borne by someone else (a negative externality). Yet software engineering and software security lifecycles do not seem to take such problems into account. What happens when there are security holes in libraries or third-party components? Can things like contracts be extended with security guarantees to address security externalities?
Security implications of Java RMI
In Java, distributed systems can be programmed using Remote Method Invocation (RMI). As RMI can lead to dynamic class loading, there is a risk of malicious code injection, so that appropriate security checks need to be used. This project investigates techniques for doing so, focusing on passing remote and non-remote objects back and forth between client and server. A possible result of the project could be to identify Design Patterns for using RMI securely.
Understanding Java stack inspection
Stack inspection in Java is designed to prevent malicious code from mounting the "confused deputy" attack by calling methods to do its bidding. Stack inspection is a relatively new and in some ways ad-hoc mechanism. There is an active area of research that tries to put it into a more systematic framework for access control. The aim of this project is to review and build on this literature. Hence the project is largely dissertation, but it should include code examples.
Aspect oriented programming for security
Aspects can evidently be used for secure programming, such as adding access control to code. As aspects are a fairly new technology, it is not entirely clear how useable or scalable aspects are in this role. This projects examines some case studies of aspects for security and evaluates them. Ideally, aspects for security should also be compared to alternative forms of access control, such as Java stack inspection.
Defending against command injection attacks
One of the main avenues of attack on web applications consists of injection of SQL statements. Some recent research has developed defences against such attacks, based on parsing to check whether the input violates assumptions made by the programmer. The aim of this project is to build on the research, and perhaps to apply it to other relevant web technologies, apart from SQL; perhaps eval functions and the like.
Software security audit
We have had a very successful MSc project in which the code of a small software company was audited for security following the methodologies of Dowd et al and McGraw. It would be nice to have such projects again; the main difficulty is to find companies that are willing to trust a student with their code base. (The results would not be made public and only read by the supervisor and second marker.)
Game theory in computer security
Computer security can lead to arms races between attack and defence. For instance, some compiler techniques defend against the more straightforward kinds of stack buffer overflow; and in response, attackers use more devious attacks, leading to further defences. Game theory was developed to deal with strategic behaviour between adversaries (as in the Cold War). Can Game theory shed some light on this ongoing interaction between attacker and defence? Can it tell us how to get the most value for money in computer security?
Javascript/CSS/XHTML obfuscator
To avoid having web content plagiarized, one may render the source unreadable by automatic obfuscation. This project should use a parser generator (like ANTLR, SableCC, JavaCC,...) to read the web source code, and then produce an obfuscated version.
Security in on-line games
Attacks on on-line games are predicted to be a growth area of computer insecurity. This project consists of cases studies of such attacks, focusing on the particular technical challenges of online games, such as distributed, highly concurrent software. For instance, race conditions could be an issue.