Near-wordless document structure classification

  • Authors:
  • K. Summers

  • Affiliations:
  • -

  • Venue:
  • ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic derivation of logical document structure from generic layout would enable the development of many highly flexible electronic document manipulation tools. This problem can be divided into the segmentation of text into pieces and the classification of these pieces as particular logical structures. This paper proposes an approach to the classification of logical document structures, according to their distance from predefined prototypes. The prototypes consider linguistic information minimally, thus relying minimally on the accuracy of OCR and decreasing language-dependence. Different classes of logical structures and the differences in the requisite information for classifying them are discussed. A prototype format is proposed, existing prototypes and a distance measurement are described, and performance results are provided.