Noureddin Sadawi - My Research
I am interested in Image Analysis, Pattern (Symbol) Recognition and Machine Learning in general. My focus in my PhD is on Optical Character Recognition (OCR) and Chemical Structure Recognition. Chemical Structure Recognition, or, Molecule Recognition is the process of reading in a chemical molecule bitmap image and generating an equivalent textual, or tabular, representation.
I have recently developed a tool using OCAML which can read a bitmap molecule image and generate its corresponding MOL file. The tool is still under development and optimization and will be open sourced in the future.
The tool, named MolRec, does not use any third party tools to perform the necessary tasks in order to recognize a molecule image. Everything, from OCR, thinning, corner detection ... to building the MOL file, is done from scratch.
We participated in the TEXT RETRIEVAL CONFERENCE (TREC) 2011. The events aims at encouraging research in information retrieval and related applications by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. We were given a training set of chemical structures and then a test set. We were asked to submit the results of two runs and we are happy with MolRec's performance.
More information about my work can be found in the following papers and presentation slides:
Publications
- Paper 1: Recognising Chemical Formulas from Molecule Depictions. PreProceedings of GREC 2009
- And here is my presentation slides at GREC 2009 Presentation at GREC 2009
- Paper 2: Chemical Structure Recognition: a rule-based approach. Appeared at Document Recognition and Retrieval XIX (DRR2012).
- Paper 3: Performance of MolRec at TREC 2011's I2S Task. Appeared at Text REtrieval Conference TREC 2011.
- Poster: A Poster about MolRec and its performance at TREC 2011. Appeared at Text REtrieval Conference TREC 2011.
- Talk 1: MolRec and its performance at TREC 2011. Prepared and delivered by Alan Sexton.
- Talk 2: Chemical Structure Recognition: a rule-based approach. Prepared and delivered by Volker Sorge.
Research Visits & Summer Schools:
- I attended The 6th International Summer School and Workshop on Pattern Recognition, (ISSPR 2010) in Plymouth, UK.
- I also visited Suzuki Lab in Fukuoka, Japan.
Benchmark Dataset:
We have recently created a benchmark dataset of molecule structure images and their corresponding MOL files. This is, as far as we are aware, the largest freely available molecule dataset and it has 5740 images. More information about this dataset can be found here.
.Also,
I was/am involved in many other activities such as:
- I was a member of the organizing committee of the Conferences on Intelligent Computer Mathematics (CICM) 2008. Click here for the poster.
- I organize The Scientific Document Analysis Group (SDAG)'s weekly meetings (previously biweekly).
- I provide voluntary project guidance and consultation for various project types. Previous projects included Pattern Recognition and Image Analysis Applications, Database Applications, Web Applications, Network Applications and others.