Document Classification Using Layout Analysis

  • Authors:
  • Jianying Hu;Ramanujan Kashi;Gordon Wilfong

  • Affiliations:
  • -;-;-

  • Venue:
  • DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes methods for document image classification at the spatial layout level. The goal is to develop fast algorithms for initial document type classification without OCR, which can then be verified using more elaborate methods based on more detailed geometric and syntactic models. A novel feature set called interval encoding is introduced to capture elements of spatial layout. This feature set encodes region layout information in fixed-length vectors by capturing structural characteristics of the image. We demonstrate the usefulness of these features derived from interval coding, in a hidden Markov model based page layout classification system that is trainable and extendable.