Noureddin Sadawi - My Research

I am interested in Image Analysis, Pattern (Symbol) Recognition and Machine Learning in general. My focus in my PhD is on Optical Character Recognition (OCR) and Chemical Structure Recognition. Chemical Structure Recognition, or, Molecule Recognition is the process of reading in a chemical molecule bitmap image and generating an equivalent textual, or tabular, representation.

I have recently developed a tool using OCAML which can read a bitmap molecule image and generate its corresponding MOL file. The tool is still under development and optimization and will be open sourced in the future.

The tool, named MolRec, does not use any third party tools to perform the necessary tasks in order to recognize a molecule image. Everything, from OCR, thinning, corner detection ... to building the MOL file, is done from scratch.

Image to MOL file
Recently,

We participated in the TEXT RETRIEVAL CONFERENCE (TREC) 2011. The events aims at encouraging research in information retrieval and related applications by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. We were given a training set of chemical structures and then a test set. We were asked to submit the results of two runs and we are happy with MolRec's performance.

More information about my work can be found in the following papers and presentation slides:

Publications

Research Visits & Summer Schools:

Benchmark Dataset:

We have recently created a benchmark dataset of molecule structure images and their corresponding MOL files. This is, as far as we are aware, the largest freely available molecule dataset and it has 5740 images. More information about this dataset can be found here.

.

Also,

I was/am involved in many other activities such as: