Overview of the INEX 2009 book track
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Rule based document understanding of historical books using a hybrid fuzzy classification system
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
Scanned then OCRed documents usually lack detailed layout and structural information. We present a book specific layout analysis system used to extract TOC structure information from the scanned and OCRed books. This system was used for navigation purposes by the live books search project. We provide labeling scheme for the TOC sections of the books, high level overview for the book layout analysis system, as well as TOC Structure Extraction Engine. In the end we present accuracy measurements of this system on a representative test set.