Student Projects in Summer 2018 (PG) and 2018/19 (UG)
I like to supervise projects that involve solving a problem or answering a question and that require a combination of computational thinking and efficient implementation. I am particularly interested in supervising projects in analysis of complex biological datasets. These projects typically involve the analysis of large amounts of rather noisy data in which it is hard to design rules for extracting features. For this reason, machine learning is becoming increasingly important in my work and I am keen to supervise projects in this area.
I am very open to supervising multiple students on different aspects of the same problem. This can be helpful in understanding very complex problems whilst still doing individual work. Please do approach me if you would like to discuss this.
My office hours are 9:00-10:00am on Mondays and Tuesdays; if these are not convenient you are welcome to email me for an appointment.
Some possible project areas are listed below:
Learning the structure-function relationship in GPCRs
G-Protein-Coupled Receptors (GPCRs)are a type of protein that are found in the membrane of cells. Their role is to sense the cell's environment and trigger chemical processes inside the cell in response. They are the target or nearly 50% of all drugs and so understanding how they is of enormous importance in developing new therapies. Professor Dmitriy Veprintsev in Nottingham and I are working together to understand how the structure of GPCRs affects their function. Dmitriy is performing exquiste experiments that manipulate the structure and measures the multidimensional function for each manipulation. These experiments are difficult and time-consuming, and so we are developing machine learning techniques that can learn from and generalise, to predict what experiments we should be doing. There is scope for several projects in this area, or for a group working together on different aspects.
Analysis of microscopy images
Visualisation of colocalisation of EGF-EGFP with rab5-mRFP in a HeLa cell. Adapted from Pike et al, Methods, 2017.
Through my role in COMPARE - the Centre of Membrane Proteins and Receptors - I have access to a wide range of data from the most advanced microscopy techniques currently available. These techniques allow biology to be studied at the molecular level at ultra-high spatial resolution. Some examples of the techniques we are using include:
- STORM - stochastic optical reconstruction microscopy
- SPIM - selective plane illumination microscopy
- SIM - structured illumination microscopy
- Lattice light sheet microscopy
- Confocal microscopy
- TIRF - Total internal reflection fluorescence microscopy
I am interested in using both classical image processing techniques and modern machine learning techniques to understand these richly informative datasets. Projects in this area are very open-ended and are best suited to highly motivated ambitious students who would like to do in-depth study in the areas of image processing, computer vision and machine learning. There will naturally be a reasonable amount of programming but the main complexity will be in the design of the algorithms. Some mathematics may be involved.
Structure of C3 immune protein. From https://en.wikipedia.org/wiki/Complement_component_3
Machine Learning in Proteomics
Automatically identifying proteins and their post-translational modifications from chemical analysis is a very hard problem on which relatively little work has been done. I have several ideas for how one can attack this question of central importance, drawing on recent advances in the machine learning literature. A related problem is to predict the folding of proteins from their amino acid sequence alone. There have been some interesting approaches to this recently and I have some ideas for how to extend and improve these. These projects are potentially suitable for someone who wants to explore the state of the art in machine learning. The programming should be straightforward, but algorithm development will require you to develop a deep understanding of the latest developments in ML.
Extensible Software Toolkits for Bioimage Data
Based on the latest open standards for data exchange, there are opportunities to develop and distribute open analysis packages for bioimage data. I have done some work on this in the past, in the form of a universal format converter, but would like to turn this into a more complete set of analysis tools in a way that can be easily extended to add new functionality. There are some serious challenges here due to the sheer size of some of the data sets which can run to hundreds of GB for a single image set. I am especially interested in what functional programming languages (especially Scala) have to offer in this domain. These are engineering projects suitable for keen programmers.