A Study on the Document Zone Content Classification Problem
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Making Documents Work: Challenges for Document Understanding
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Machine Printed Text and Handwriting Identification in Noisy Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Transforming arbitrary tables into logical form with TARTAR
Data & Knowledge Engineering
A hierarchical genetic algorithm for segmentation of multi-spectral human-brain MRI
Expert Systems with Applications: An International Journal
Web Semantics: Science, Services and Agents on the World Wide Web
Hi-index | 0.00 |
Abstract: This paper presents an improved zone content classification method and its performance evaluation. We added two new features to the feature vector from one previously published method [1]. We assumed different independence relationship in two zone sets. We used an optimized binary decision tree to estimate the maximum zone content class probability in one set while used Viterbi algorithm to find the optimal solution for a zone sequence in the other set. The training, pruning and testing data set for the algorithm include 1; 600 images drawn from the UWCDROMIII document image database. The classifier is able to classify each given scientific and technical document zone into one of the nine classes, 2 text classes (of font size 4 - 18pt and font size 19-32 pt), math, table, halftone, map/drawing, ruling, logo, and others. Compared with our previous work [2], it raised the accuracy rate to 98:52% from 97:53% and reduced the mean false alarm rate to 0:53% from 1:26%.