Scientific Document Analysis and Abramowitz and Stegun
While document analysis is a rich and varied field of research, only a relatively small part of that field has been devoted to the analysis of scientific documents containing mathematics and technical diagrams. One of the problems that researchers in this sub-area are faced with is the lack of shared datasets that can be used both as test resources and for comparison of algorithms and systems.
Abramowitz and Stegun provide an excellent source of such shared data. As it was published by the United States Government Printing Office, it is copyright free and hence fully available for researchers (and anyone else) to scan, report about and make their results freely available. Its history and the respect in which scientists have held the book make it an authoritative source for many types of expressions, diagrams and tables. The fact that it was printed in the pre-LaTeX days means that it can help researchers to avoid over-tuning their systems to the much more readily available TeX/LaTeX sourced documents
This project is to develop an open, freely shareable set of resources to cover all stages and intermediate result datasets in a toolchain for document analysis of scientific documents based on a high quality scan of Abramowitz and Stegun.
A paper describing this project has been published as: Alan P. Sexton, "Abramowitz and Stegun - A Resource for Mathematical Document Analysis", In Conferences on Intelligent Computer Mathematics (CICM 2012), Springer Berlin / Heidelberg, vol. 7362, pp. 159-168, 2012. [bib] [pdf] [doi]
As mentioned above, Abramowitz and Stegun was published by the United States Government Printing Office and is copyright free. My intention was to release my scans and various processed results as copyright free too, but have since discovered via legal advice that the act of producing the scans has created a copyright on them that can not be simply ignored. The end result is that my previous statements to that effect on this website left potential users uncertain about the legal status of the scans and were a barrier to their free use.
Therefore I release all images, data sets, data files and materials on this project web site with the Creative Commons, Attribution 3.0 Unported license (CC BY 3.0).
A Resource for Scientific Document Analysis: Abramowitz and Stegun by Alan P. Sexton is licensed under a Creative Commons Attribution 3.0 Unported License.
A different scanned copy of Abramowitz and Stegun is available from the very nice site: http://people.math.sfu.ca/~cbm/aands/. In particular, this site includes a html interface to the book as well as a downloadable pdf. For document analysis purposes, the JPG format images are less suitable than tiff and there are a few pages missing from their scan.
|Abramowitz and Stegun is no longer available from the US Government Printing Office. However, Dover have produced a copy that is based on the same final printing used in this scan. In their preface, they say they have added additional corrections to 9 pages. These corrections seem very minor. For example, they have removed a footnote from page 825, and added one to page 934. They seem to have used the same printmasters as in the GPO version as the same printing flaws appear. The book is available at amazon.com or amazon.co.uk|
|The National Institute of Standards and Technology have continued the work of Abramowitz and Stegun by developing a new Digital Library of Mathematical Functions, which provides a huge expansion over the contents of Abramowitz and Stegun, sophisticated searchability, accessibility and many advanced features. For those who like to read from paper, they have published a book version: amazon.com or amazon.co.uk|
Thanks to Bruce Miller of the National Institute of Standards and Technology, who sent me a clean new copy of the book for scanning.
Thanks to Bruno Voisin, of the Laboratory of Geophysical and Industrial Flows (LEGI), Grenoble, France, who graciously gave me a copy of his code to create PDF bookmarks for Abramowitz and Stegun as a basis for the bookmarks in the PDFs on this page.
Scanning was carried out on an inexpensive Epson 3170 Photo scanner using XSane under Ubuntu Linux to obtain initial PNM files.
If you use these images in your research on document analysis or mathematical knowledge management, please let me know so that I can add links to your project on this page and can keep you informed of any updates or changes.
If you would like to provide further data on these pages, e.g. optical character recognition results, MathML output from formula recognition tools etc., please let me know so that I can add them to this page.
For further work in this area, see the Scientific Document Analysis Group web pages.