Document zone content classification and its performance evaluation

  • Authors:
  • Yalin Wang;Ihsin T. Phillips;Robert M. Haralick

  • Affiliations:
  • Department of Electrical Engineering, University of Washington, Seattle, WA 98195, USA;Department of Computer Science, Queens College, CUNY, Flushing, NY 11367, USA;The Graduate School, CUNY, New York, NY 10016, USA

  • Venue:
  • Pattern Recognition
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper describes an algorithm for the determination of zone content type of a given zone within a document image. We take a statistical based approach and represent each zone with 25 dimensional feature vectors. An optimized decision tree classifier is used to classify each zone into one of nine zone content classes. A performance evaluation protocol is proposed. The training and testing data sets include a total of 24,177 zones from the University of Washington English Document Image database III. The algorithm accuracy is 98.45% with a mean false alarm rate of 0.50%.