Empirical Performance Evaluation of Graphics Recognition Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge-based English cursive script segmentation
Pattern Recognition Letters
Empirical Performance Evaluation Methodology and Its Application to Page Segmentation Algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Semantics-Based Content Extraction in Typewritten Historical Documents
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Pattern Recognition, Third Edition
Pattern Recognition, Third Edition
Keyword-guided word spotting in historical printed documents using synthetic data and user feedback
International Journal on Document Analysis and Recognition
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
ICDAR 2009 Handwriting Segmentation Contest
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
A major difficulty for designing a document image segmentation methodology is the proper value selection for all involved parameters. This is usually done after experimentations or after involving a training supervised phase which is a tedious process since the corresponding segmentation ground truth has to be created. In this paper, we propose a novel automatic unsupervised parameter selection methodology that can be applied to the character segmentation problem. It is based on clustering of the entities obtained as a result of the segmentation for different values of the parameters involved in the segmentation method. The clustering is performed using features extracted from the segmented entities based on zones and from the area that is formed from the projections of the upper/lower and left/right profiles. Optimization of an appropriate intra-class distance measure yields the optimal parameter vector. The method is evaluated on two segmentation algorithms, namely a recently proposed character segmentation technique based on skeleton segmentation paths, as well as the well known RLSA technique. The proposed parameter selection method is capable of finding the segmentation parameters that correspond to the optimal or near optimal segmentation result, as this is determined by counting the number of matches between the entities detected by the segmentation algorithm and the entities in the ground truth.