Xed: A New Tool for eXtracting Hidden Structures from Electronic Documents
DIAL '04 Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL'04)
Table Recognition and Understanding from PDF Files
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
On tables of contents and how to recognize them
International Journal on Document Analysis and Recognition
Metadata Extraction from PDF Papers for Digital Library Ingest
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Towards a faithful visualization of historical books on e-book readers
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Challenges in generating bookmarks from TOC entries in e-books
Proceedings of the 2012 ACM symposium on Document engineering
Displaying chemical structural formulae in ePub format
Proceedings of the 2012 ACM symposium on Document engineering
Searching online book documents and analyzing book citations
Proceedings of the 2013 ACM symposium on Document engineering
A System for Social Reading based on EPUB3
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Hi-index | 0.00 |
We describe one tool for Table of Content (ToC) identification and recognition from PDF books. This task is part of ongoing research on the development of tools for the semi-automatic conversion of PDF documents in the Epub format that can be read on several E-book devices. Among various sub-tasks, the ToC extraction and recognition is particularly useful for an easy navigation of book contents. The proposed tool first identifies the ToC pages. The bounding boxes of ToC titles in the book body are subsequently found in order to add suitable links in the Epub ToC. The proposed approach is tolerant to discrepancies between the ToC text and the corresponding titles. We evaluated the tool on several open access books edited by University Presses that are partner of the OAPEN EcontentPlus project