Mohamed Alkalai - Research
Research Title:
Table Recognition in Scientific Documents
Over the last decade much research has been done on Document Recognition primarily focusing on the correct extraction and analysis of text. In comparison only little work has been devoted to the recognition of tables despite the fact that as tables are ubiquitous in all types of documents. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed manner. However, many current applications have little support for table recognition. The difficulty of automatically extracting tables from untagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table recognition a challenging problem. This research focuses in particular on the recognition of tables in scientific documents such as math papers, which usually have quite complex and diverse compositions. We are undertaking a thorough study on the spatial and logical structure of tables in documents from a range of scientific disciplines and develop adequate recognition techniques to deal with them. The interested is to be not only on the structure, but also on the content of tables which can contain often mathematical expression and equations.
Papers
- Mohamed Alkalai and Volker Sorge "Issues in Mathematical Table Recognition", Conferences on Intelligent Computer Mathematics (CICM) 2012
- Mohamed Alkalai, Josef B. Baker, Volker Sorge and Xiaoyan Lin, "Improving Formula Analysis with Line and Mathematics Identification", Proc. Int. Conf. 12th Document Analysis and Recognition (ICDAR) 2013
- Xiaoyan Lin, Liangcai Gao, Zhi Tang, Josef Baker, Mohamed Alkalai and Volker Sorge, "A Text Line Detection Method for Mathematical Formula Recognition", Proc. Int. Conf. 12th Document Analysis and Recognition (ICDAR) 2013
- Mohamed Alkalai and Volker Sorge, "A Histogram-based Approach to Mathematical Line Segmentation", Proc. Int. Conf. 18th Iberoamerican Congress on Pattern Recognition (CIARP) 2013, to appear.
- Mohamed Alkalai, "Recognising Tabular Mathematical Expressions using Graph Rewriting", Proc. Int. Conf. 18th Iberoamerican Congress on Pattern Recognition (CIARP) 2013, to appear.
Other Work
- A Doctoral Programme about my work, submitted to CICM 2012
Links
- PDFCompressor An excellent tool for compressing and decompressing PDF files, extracting text, producing metrics and much more.
- PDF Tool Kit Pdftk allows you to manipulate PDF easily and freely. It does not require Acrobat, and it runs on Windows, Linux, Mac OS X, FreeBSD and Solaris.
- INFTY Project A research group concentrating on scientific document analysis