A Document Classification and Extraction System with Learning Ability

  • Authors:
  • Xuhong Li;Peter A. Ng

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document image processing begins from the "OCR" phase with difficulty of automatic document "analysis" and "understanding". Most existing systems only do well in their specific application domains. In this paper, we describe a domain-independent automatic document image understanding system with learning ability. A segmentation method based on the "logical closeness" is proposed. A novel and natural representation of document layout structure - directed weight graph (DWG) is described. To classify a given document, a string representation matching is applied first instead of comparing with all the sample graphs. Frame template and document type hierarchy (DTH) are used to represent document logical structure and the hierarchical relation among these frame templates respectively. In this paper, two methodologies of learning are applied -- learning from experience and enhanced perceptron learning algorithm.