Zone Content Classification and its Performance Evaluation

  • Authors:
  • Affiliations:
  • Venue:
  • ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: This paper presents an improved zone content classification method and its performance evaluation. We added two new features to the feature vector from one previously published method [1]. We assumed different independence relationship in two zone sets. We used an optimized binary decision tree to estimate the maximum zone content class probability in one set while used Viterbi algorithm to find the optimal solution for a zone sequence in the other set. The training, pruning and testing data set for the algorithm include 1; 600 images drawn from the UWCDROMIII document image database. The classifier is able to classify each given scientific and technical document zone into one of the nine classes, 2 text classes (of font size 4 - 18pt and font size 19-32 pt), math, table, halftone, map/drawing, ruling, logo, and others. Compared with our previous work [2], it raised the accuracy rate to 98:52% from 97:53% and reduced the mean false alarm rate to 0:53% from 1:26%.