Thick 2D relations for document understanding

  • Authors:
  • Marco Aiello;Arnold M. W. Smeulders

  • Affiliations:
  • Department of Information and Telecommunication Technologies, University of Trento, Via Sommarive 14, 38050 Trento, Italy and Intelligent Sensory Information Systems, University of Amsterdam, Krui ...;Intelligent Sensory Information Systems, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands

  • Venue:
  • Information Sciences—Informatics and Computer Science: An International Journal
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We use a propositional language of qualitative rectangle relations to detect the reading order from document images. To this end, we define the notion of a document encoding rule and we analyze possible formalisms to express document encoding rules such as LaTeX and SGML. Document encoding rules expressed in the propositional language of rectangles are used to build a reading order detector for document images. In order to achieve robusmess and avoid brittleness when applying the system to real life document images, the notion of a thick boundary interpretation for a qualitative relation is introduced. The framework is tested on a collection of heterogeneous document images showing recall rates up to 89%.