On tables of contents and how to recognize them

  • Authors:
  • Hervé Déjean;Jean-Luc Meunier

  • Affiliations:
  • Xerox Research Centre Europe, 6 chemin de Maupertuis, 38240, Meylan, France;Xerox Research Centre Europe, 6 chemin de Maupertuis, 38240, Meylan, France

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a method for structuring a document according to the information present in its different organizational tables: table of contents, tables of figures, etc. This method is based on a two-step approach that leverages functional and formal (layout-based) kinds of knowledge. The functional definition of organizational table, based on five properties, is used to provide a first solution, which is improved in a second step by automatically learning the form of the table of contents. We also report on the robustness and performance of the method and we illustrate its use in a real conversion case.