A probabilistic Hough transform
Pattern Recognition
Word Spotting: A New Approach to Indexing Handwriting
CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Eigenspace Method for Text Retrieval in Historical Document Images
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A comprehensive evaluation methodology for noisy historical document recognition techniques
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Slit Style HOG Feature for Document Image Word Spotting
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
This paper presents some image processing methods that could produce accurate character segmentation results for historical newspaper archives. A full text search using a word spotting technique is no doubt a promising approach in order to facilitate the utilization of digital archives. Some word spotting techniques require the target images to be segmented into character images in advance, however character segmentation is a difficult issue especially for old and degraded document images. This paper figures out the causes that make the character segmentation difficult, and removes them in order to improve the accuracy of character segmentation. We first detect the ruled lines using Hough Transform in order to segment a whole newspaper image into column-separated images. Then we remove the ruled lines as well as ruby characters and noise. The proposed system is tested for 20 column-separated images of historical newspapers, and the accuracy of character segmentation is improved to 96.3%.