Image Analysis Using Mathematical Morphology
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Simulated annealing: theory and applications
Simulated annealing: theory and applications
Page segmentation and classification
CVGIP: Graphical Models and Image Processing
Automated Evaluation of OCR Zoning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document image analysis
An Experimental Comparison of Range Image Segmentation Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document Representation and Its Application to Page Decomposition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Segmentation of page images using the area Voronoi diagram
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Empirical Performance Evaluation of Graphics Recognition Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
The FERET Evaluation Methodology for Face-Recognition Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Computer and Robot Vision
The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic Training of Page Segmentation Algorithms: An Optimization Approach
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
A methodology for quantitative performance evaluation of detection algorithms
IEEE Transactions on Image Processing
The Segmentation and Identification of Handwriting in Noisy Document Images
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Data GroundTruth, Complexity, and Evaluation Measures for Color Document Analysis
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Machine Printed Text and Handwriting Identification in Noisy Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Table Recognition Evaluation and Combination Methods
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Extracting relevant named entities for automated expense reimbursement
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A robust page segmentation method for Persian/Arabic documents
ISCGAV'05 Proceedings of the 5th WSEAS International Conference on Signal Processing, Computational Geometry & Artificial Vision
Automatic unsupervised parameter selection for character segmentation
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Document resizing for visually impaired students
Proceedings of the 22nd Conference of the Computer-Human Interaction Special Interest Group of Australia on Computer-Human Interaction
Automatic localization of page segmentation errors
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Learning to segment document images
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Performance comparison of six algorithms for page segmentation
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.14 |
While numerous page segmentation algorithms have been proposed in the literature, there is lack of comparative evaluation驴empirical or theoretical驴of these algorithms. In the existing performance evaluation methods, two crucial components are usually missing: 1) automatic training of algorithms with free parameters and 2) statistical and error analysis of experimental results. In this paper, we use the following five-step methodology to quantitatively compare the performance of page segmentation algorithms: 1) First, we create mutually exclusive training and test data sets with groundtruth, 2) we then select a meaningful and computable performance metric, 3) an optimization procedure is then used to search automatically for the optimal parameter values of the segmentation algorithms on the training data set, 4) the segmentation algorithms are then evaluated on the test data set, and, finally, 5) a statistical and error analysis is performed to give the statistical significance of the experimental results. In particular, instead of the ad hoc and manual approach typically used in the literature for training algorithms, we pose the automatic training of algorithms as an optimization problem and use the Simplex algorithm to search for the optimal parameter value. A paired-model statistical analysis and an error analysis are then conducted to provide confidence intervals for the experimental results of the algorithms. This methodology is applied to the evaluation of five page segmentation algorithms of which, three are representative research algorithms and the other two are well-known commercial products, on 978 images from the University of Washington III data set. It is found that the performance indices (average textline accuracy) of the Voronoi, Docstrum, and Caere segmentation algorithms are not significantly different from each other, but they are significantly better than that of ScanSoft's segmentation algorithm, which, in turn, is significantly better than that of X-Y cut.