Mohamed Alkalai - Research

Research Title:

Table Recognition in Scientific Documents

Over the last decade much research has been done on Document Recognition primarily focusing on the correct extraction and analysis of text. In comparison only little work has been devoted to the recognition of tables despite the fact that as tables are ubiquitous in all types of documents. In scientific documents, tables are widely used to present experimental results or statistical data in a condensed manner. However, many current applications have little support for table recognition. The difficulty of automatically extracting tables from untagged documents, the lack of a universal table metadata specification, and the limitation of the existing ranking schemes make table recognition a challenging problem. This research focuses in particular on the recognition of tables in scientific documents such as math papers, which usually have quite complex and diverse compositions. We are undertaking a thorough study on the spatial and logical structure of tables in documents from a range of scientific disciplines and develop adequate recognition techniques to deal with them. The interested is to be not only on the structure, but also on the content of tables which can contain often mathematical expression and equations.

Papers

Other Work

Links

  • PDFCompressor An excellent tool for compressing and decompressing PDF files, extracting text, producing metrics and much more.
  • PDF Tool Kit Pdftk allows you to manipulate PDF easily and freely. It does not require Acrobat, and it runs on Windows, Linux, Mac OS X, FreeBSD and Solaris.
  • INFTY Project A research group concentrating on scientific document analysis