Zone Content Classification and its Performance Evaluation

Authors:
Affiliations:
Venue:
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Year:
2001

Citing 0
Cited 6

A Study on the Document Zone Content Classification Problem

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Making Documents Work: Challenges for Document Understanding

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Machine Printed Text and Handwriting Identification in Noisy Document Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Transforming arbitrary tables into logical form with TARTAR

Data & Knowledge Engineering
A hierarchical genetic algorithm for segmentation of multi-spectral human-brain MRI

Expert Systems with Applications: An International Journal
From tables to frames

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: This paper presents an improved zone content classification method and its performance evaluation. We added two new features to the feature vector from one previously published method [1]. We assumed different independence relationship in two zone sets. We used an optimized binary decision tree to estimate the maximum zone content class probability in one set while used Viterbi algorithm to find the optimal solution for a zone sequence in the other set. The training, pruning and testing data set for the algorithm include 1; 600 images drawn from the UWCDROMIII document image database. The classifier is able to classify each given scientific and technical document zone into one of the nine classes, 2 text classes (of font size 4 - 18pt and font size 19-32 pt), math, table, halftone, map/drawing, ruling, logo, and others. Compared with our previous work [2], it raised the accuracy rate to 98:52% from 97:53% and reduced the mean false alarm rate to 0:53% from 1:26%.