|
|
While document analysis is a rich and varied field of research, only a relatively small part of that field has been devoted to the analysis of scientific documents containing mathematics and technical diagrams. One of the problems that researchers in this sub-area are faced with is the lack of shared datasets that can be used both as test resources and for comparison of algorithms and systems.
Abramowitz and Stegun provide an excellent source of such shared data. As it was published by the United States Government Printing Office, it is copyright free and hence fully available for researchers (and anyone else) to scan, report about and make their results freely available. Its history and the respect in which scientists have held the book make it an authoritative source for many types of expressions, diagrams and tables. The fact that it was printed in the pre-LaTeX days means that it can help researchers to avoid over-tuning their systems to the much more readily available TeX/LaTeX sourced documents
Copyright All image downloads from this page are provided copyright free. I ask you only, as a professional courtesy, to acknowledge the source (http://www.cs.bham.ac.uk/~aps/research/projects/as/) if you use them in published work.
This page contains downloadable, scanned images of the final printing: “10th printing, December 1972, with corrections” of Abramowitz and Stegun. The complete book (1060 pages including the front matter) is freely downloadable from this web page in three formats: as a multipage, binarised, deskewed TIFF file at 300 pixels per inch with all connected components (i.e. connected ink blots) smaller than the smallest correct dot removed (54MBytes), as a similarly processed and formatted document but at 600 pixels per inch (111 MBytes), and as a PDF file in both A4 and Letter formats with bookmarked links to all sections, subsections and tables. This PDF is based on the 300 pixels per inch version described above.
Although the complete book in binarised form, i.e. in black and white, is available from this web page, the original scanned images, which are in 8 bit grey scale, come to a total of 19 GBytes as 600 pixels per inch TIFF files using zip (deflate) compression. Because of the size of this collection, they are not available online. Instead I have made a selection of the original grey page scans available. If you have a need for the full collection of grey page scans, please contact me explaining your requirement and I will try to overcome the logistical problems of providing them to you.
The files contain all pages in the book, including the front and back matter and blank pages separating sections. The latter were included so that the relationship between TIFF/PDF file page numbers and book page numbers would remain as simple as possible.
Note that the original printing process for the book was such that there are a significant number of printing flaws in the book itself. Many flaws in the scanned images are faithful reproductions of these printing flaws rather than artifacts of the scanning process. In particular, most pages of the book have some slight skew - up to 1.35 degrees in the worst cases. While the scanning process undoubtedly introduced some level of skew, most of the skew appears in the original book. The deskewing was carried out automatically based on a projection profile approach and, although it is by no means perfect, it has reduced the skew in all cases.
| Complete Book in Binarised (Black and White) Form | ||||
|---|---|---|---|---|
| Note | Description | Size | Download ChangeLog | |
| If you just want a scanned copy of Abramowitz and Stegun to read, then download this | Complete book (1060 pages) in PDF format with bookmarks: 300 pixels per inch, binarised, deskewed, and with small connected components removed | 55 MBytes | AandS-a4-v1-1.pdf (A4 format) | |
| AandS-letter-v1-1.pdf (U.S. Letter format) | ||||
| If you are interested in doing some document analysis on the pages of the book, this is probably what you want | Complete book (1060 pages) in 300 pixels per inch, binarised, deskewed, g4 compressed TIFF with small connected components removed | Single multipage Tiff | 54 MBytes | AandS-mono300.tif |
| Tar gzip of single page TIFFs | 51 MBytes | AandS-mono300.tgz | ||
| If you are interested in investigating the difference that higher resolutions make in document analysis, this might be useful | Complete book (1060 pages) in 600 pixels per inch, binarised, deskewed, g4 compressed TIFF with small connected components removed | Single multipage TIFF | 111 MBytes | AandS-mono600.tif |
| Tar gzip of single page TIFFs | 107 MBytes | AandS-mono600.tgz | ||
The following are samples from the original scans without any image processing applied. They are all 600 pixels per inch, 8 bit grey scale, deflate compressed, single page TIFF images. Each image is between 17.5 and 20.5 MBytes. They are provided both for the curious and for anyone interested in using them for research on binarisation, noise reduction, de-skewing, or grey scale optical character recognition.
A different scanned copy of Abramowitz and Stegun is available from the very nice site: http://people.math.sfu.ca/~cbm/aands/. In particular, this site includes a html interface to the book as well as a downloadable pdf. For document analysis purposes, the JPG format images are less suitable than tiff and there are a few pages missing from their scan.
|
Abramowitz and Stegun is no longer available from the US Government Printing Office. However, Dover have produced a copy that is based on the same final printing used in this scan. In their preface, they say they have added additional corrections to 9 pages. These corrections seem very minor. For example, they have removed a footnote from page 825, and added one to page 934. They seem to have used the same printmasters as in the GPO version as the same printing flaws appear. The book is available at amazon.com or amazon.co.uk |
|
The National Institute of Standards and Technology have continued the work of Abramowitz and Stegun by developing a new Digital Library of Mathematical Functions, which provides a huge expansion over the contents of Abramowitz and Stegun, sophisticated searchability, accessibility and many advanced features. For those who like to read from paper, they have published a book version: amazon.com or amazon.co.uk |
Thanks to Bruce Miller of the National Institute of Standards and Technology, who sent me a clean new copy of the book for scanning.
Thanks to Bruno Voisin, of the Laboratory of Geophysical and Industrial Flows (LEGI), Grenoble, France, who graciously gave me a copy of his code to create PDF bookmarks for Abramowitz and Stegun as a basis for the bookmarks in the PDFs on this page.
Scanning was carried out on an inexpensive Epson 3170 Photo scanner using XSane under Ubuntu Linux to obtain initial PNM files.
Image processing was carried out using the excellent Gamera Toolkit and the libtiff library
If you use these images in your research on document analysis or mathematical knowledge management, please let me know so that I can add links to your project on this page and can keep you informed of any updates or changes.
If you would like to provide further data on these pages, e.g. optical character recognition results, MathML output from formula recognition tools etc., please let me know so that I can add them to this page.
For further work in this area, see the Scientific Document Analysis Group web pages.
Alan P. Sexton
University of Birmingham
School of Computer Science
Email: A.P.Sexton@cs.bham.ac.uk