Image Document Categorization Using Hidden Tree Markov Models and Structured Representations

  • Authors:
  • Michelangelo Diligenti;Paolo Frasconi;Marco Gori

  • Affiliations:
  • -;-;-

  • Venue:
  • ICAPR '01 Proceedings of the Second International Conference on Advances in Pattern Recognition
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Categorization is an important problem in image document processing and is often a preliminary step for solving subsequent tasks such as recognition, understanding, and information extraction. In this paper the problem is formulated in the framework of concept learning and each category corresponds to the set of image documents with similar physical structure. We propose a solution based on two algorithmic ideas. First, we transform the image document into a structured representation based on X-Y trees. Compared to "flat" or vector-based feature extraction techniques, structured representations allow us to preserve important relationships between image sub-constituents. Second, we introduce a novel probabilistic architecture that extends hidden Markov models for learning probability distributions defined on spaces of labeled trees.