Machine Printed Text and Handwriting Identification in Noisy Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to segment document images
PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Hi-index | 0.00 |
Abstract: A single-parameter text-line extraction algorithm is described along with an efficient technique for estimating the optimal value for the parameter for individual images without need for ground truth. The algorithm is based on three simple tree operations, cut, glue and flip. An XY-tree representing the segmentation is incrementally transformed to reflect a change in the parameter while intrinsic measures of the cost of the transformation are used to detect when specific tree operations would cause an error if they were performed, allowing these errors to be avoided. The algorithm correctly identified 98.8% of the area of the ground truth bounding boxes and committed no column bridging errors on a set of 97 test images selected from a variety of technical journals.