Automatic unsupervised parameter selection for character segmentation

  • Authors:
  • G. Vamvakas;N. Stamatopoulos;B. Gatos;S. J. Perantonis

  • Affiliations:
  • National Center for Scientific Research "Demokritos", Agia Paraskevi, Athens, Greece;National Center for Scientific Research "Demokritos", Agia Paraskevi, Athens, Greece;National Center for Scientific Research "Demokritos", Agia Paraskevi, Athens, Greece;National Center for Scientific Research "Demokritos", Agia Paraskevi, Athens, Greece

  • Venue:
  • DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A major difficulty for designing a document image segmentation methodology is the proper value selection for all involved parameters. This is usually done after experimentations or after involving a training supervised phase which is a tedious process since the corresponding segmentation ground truth has to be created. In this paper, we propose a novel automatic unsupervised parameter selection methodology that can be applied to the character segmentation problem. It is based on clustering of the entities obtained as a result of the segmentation for different values of the parameters involved in the segmentation method. The clustering is performed using features extracted from the segmented entities based on zones and from the area that is formed from the projections of the upper/lower and left/right profiles. Optimization of an appropriate intra-class distance measure yields the optimal parameter vector. The method is evaluated on two segmentation algorithms, namely a recently proposed character segmentation technique based on skeleton segmentation paths, as well as the well known RLSA technique. The proposed parameter selection method is capable of finding the segmentation parameters that correspond to the optimal or near optimal segmentation result, as this is determined by counting the number of matches between the entities detected by the segmentation algorithm and the entities in the ground truth.