Zone classification in a document using the method of feature vector generation

Authors:
R. Sivaramakrishnan;I. T. Phillips;J. Ha;S. Subramanium;R. M. Haralick
Affiliations:
-;-;-;-;-
Venue:
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Year:
1995

Citing 0
Cited 1

Document zone content classification and its performance evaluation

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

A document can be divided into zones on the basis of its content. For example, a zone can be either text or non-text. This paper describes an algorithm to classify each given document zone into one of nine different classes. Features for each zone such as run length mean and variance, spatial mean and variance, fraction of the total number of black pixels in the zone, and the zone width ratio for each zone are extracted. Run length related features are computed along four different canonical directions. A decision tree classifier is used to assign a zone class on the basis of its feature vector. The performance on an independent test set was 97%.