Table of contents recognition for converting PDF documents in e-book formats
Proceedings of the 10th ACM symposium on Document engineering
XRCE participation to the 2009 book structure task
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Xeproc©: a model-based approach towards document process preservation
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Rule based document understanding of historical books using a hybrid fuzzy classification system
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Digital Preservation in Grids and Clouds: A Middleware Approach
Journal of Grid Computing
Searching online book documents and analyzing book citations
Proceedings of the 2013 ACM symposium on Document engineering
Hi-index | 0.00 |
We present a method for structuring a document according to the information present in its different organizational tables: table of contents, tables of figures, etc. This method is based on a two-step approach that leverages functional and formal (layout-based) kinds of knowledge. The functional definition of organizational table, based on five properties, is used to provide a first solution, which is improved in a second step by automatically learning the form of the table of contents. We also report on the robustness and performance of the method and we illustrate its use in a real conversion case.