Knowledge-based derivation of document logical structure

  • Authors:
  • D. Niyogi;S. N. Srihari

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

The analysis of a document image to derive a symbolic description of its structure and contents involves using spatial domain knowledge to classify the different printed blocks (e.g., text paragraphs), group them into logical units (e.g., newspaper stories), and determine the reading order of the text blocks within each unit. These steps describe the conversion of the physical structure of a document into its logical structure. We have developed a computational model for document logical structure derivation, in which a rule-based control strategy utilizes the data obtained from analyzing a digitized document image, and makes inferences using a multi-level knowledge base of document layout rules. The knowledge-based document logical structure derivation system (DeLoS) based on this model consists of a hierarchical rule-based control system to guide the block classification, grouping and read-ordering operations; a global data structure to store the document image data and incremental inferences; and a domain knowledge base to encode the rules governing document layout.