Structuring documents according to their table of contents

Authors:
Hervé Déjean;Jean-Luc Meunier
Affiliations:
Xerox Research Centre Europe, Meylan, France;Xerox Research Centre Europe, Meylan, France
Venue:
Proceedings of the 2005 ACM symposium on Document engineering
Year:
2005

Citing 9
Cited 11

Data clustering: a review

ACM Computing Surveys (CSUR)
Logical Structure Analysis of Book Document Images Using Contents Information

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
An automated generation of an electronic library based on document image understanding

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Document Understanding Using Probabilistic Relaxation: Application on Tables of Contents of Periodicals

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Part-of-Speech Tagging for Table of Contents Recognition

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Automated Detection and Segmentation of Table of Contents Page from Document Images

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Document Transformation System from Papers to XML Data Based on Pivot XML Document Method

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Text-mining based journal splitting

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Optimized XY-Cut for Determining a Page Reading Order

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition

A model for mapping between printed and digital document instances

Proceedings of the 2007 ACM symposium on Document engineering
Logical document conversion: combining functional and formal knowledge

Proceedings of the 2007 ACM symposium on Document engineering
Stacked dependency networks for layout document structuring

Proceedings of the 2008 ACM symposium on Applied computing
A solution for an unified vision of the enterprise informations

Proceedings of the 2006 conference on Leading the Web in Concurrent Engineering: Next Generation Concurrent Engineering
Multi-page document analysis based on format consistency and clustering

International Journal of Computer Applications in Technology
Document: a useful level for facing noisy data

AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
XRCE participation to the 2009 book structure task

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Reengineering PDF-based documents targeting complex software specifications

International Journal of Knowledge and Web Intelligence
A system for converting PDF documents into structured XML format

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Challenges in generating bookmarks from TOC entries in e-books

Proceedings of the 2012 ACM symposium on Document engineering
Searching online book documents and analyzing book citations

Proceedings of the 2013 ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a method for structuring a document according to the information present in its Table of Contents. The detection of the ToC as well as the determination of the parts it refers to in the document body rely on a series of generic properties characterizing any ToC, while its hierarchization is achieved using clustering techniques. We also report on the robustness and performance of the method before discussing it, in light of related work.