A Survey of Methods and Strategies in Character Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Text Detection in Images Based on Unsupervised Classification of High-Frequency Wavelet Coefficients
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Text Detection from Natural Scene Images: Towards a System for Visually Impaired Persons
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
Digital Image Processing (3rd Edition)
Digital Image Processing (3rd Edition)
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Character Segmentation-by-Recognition Using Log-Gabor Filters
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Devanagari and Bangla Text Extraction from Natural Scene Images
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Headline based text extraction from outdoor images
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Text detection of two major indian scripts in natural scene images
CBDAR'11 Proceedings of the 4th international conference on Camera-Based Document Analysis and Recognition
NEOCR: a configurable dataset for natural image text recognition
CBDAR'11 Proceedings of the 4th international conference on Camera-Based Document Analysis and Recognition
Hi-index | 0.00 |
Some studies on extraction of Bangla texts from scene images are available in the literature. Also, OCR of printed Bangla texts has been extensively studied. However, the performance of available Bangla OCR on scene texts is not acceptable. In this article, we present our recent study of segmentation of characters or their parts from Bangla texts extracted from scene images. The proposed approach detects the background and text by a combination of two algorithms: unsupervised learning algorithm K-means clustering and Otsu's threshold selection. We propose a criterion to choose an optimal K value for K-means clustering. The text segmentation is based on region growing and extraction of both headline and baseline of such texts. These two lines divide a Bangla word into three horizontal zones. The present algorithm segments characters or their parts in each individual zone. This zone-based segmentation approach helps to reduce the number of symbols to be handled by the classifier in the next stage of the OCR system. Our algorithm can also detect an image having only numerals, avoiding zone detection in that case. Extracted scene texts are often affected by artifacts and our segmentation algorithm can remove them efficiently. Our algorithm has been tested on 2460 Bangla words extracted from 260 scene images.