Research

I am interested in Image Analysis, Pattern (Symbol) Recognition and Machine Learning in general. The focus of my PhD was Automatic Recognition of Chemical Structure Diagrams. Chemical Structure Diagram Recognition is the process of reading in a chemical molecule bitmap image and generating an equivalent textual, or tabular, representation.

I developed a tool using OCAML which can read a bitmap molecule image and generate its corresponding MOL file. The tool is still under development and optimization and will be open sourced in the future.

We named this tool MolRec. It does not use any third party tools to perform the necessary tasks in order to recognize a molecule image. Everything, from OCR, thinning, corner detection ... to building the MOL file, is done from scratch.

Image to MOL file

In 2011 and 2012:

We participated in the TEXT RETRIEVAL CONFERENCE (TREC) 2011 and Conference and Labs of the Evaluation Forum (CLEF 2012). The events aim at encouraging research in information retrieval and related applications by providing a large test collection, uniform scoring procedures and a forum for organizations interested in comparing their results. We were given a training set of chemical structures and then a test set. We were asked to submit the results of several and we are pleased with MolRec's performance as it scored the highest recognition rates at both events.

More information about my work can be found in my publications page.

Benchmark Dataset:

We have recently created a benchmark dataset of molecule structure images and their corresponding MOL files. This is, as far as we are aware, the largest freely available molecule dataset and it has 5740 images. More information about this dataset can be found here.

.