Word level multi-script identification
Pattern Recognition Letters
Text area detection in digital documents images using textural features
CAIP'07 Proceedings of the 12th international conference on Computer analysis of images and patterns
Text localization and extraction from complex color images
ISVC'05 Proceedings of the First international conference on Advances in Visual Computing
HVS inspired system for script identification in indian multi-script documents
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Multi-script and multi-oriented text localization from scene images
CBDAR'11 Proceedings of the 4th international conference on Camera-Based Document Analysis and Recognition
Hi-index | 0.00 |
Extraction of text areas is a necessary first step for taking a complex document image for character recognition task. In digital libraries, such OCR'ed text facilitates access to the image of document page through keyword search. Gabor filters, known to be simulating certain characteristics of the Human Visual System (HVS), have been employed for this task by a large number of scientists, in scanned document images. Adapting such a scheme for camera based document images is a relatively new approach. Moreover, design of the appropriate fi;ters to separate text areas, which are assumed to be rich in high frequency components, from non-text areas is a difficult task. The difficulty increases if the clutter is also rich in high frequency components. Other reported works, on separating text from non-text areas, have used geometrical/structural information like shape and size of the regions in binarized document images.In this work, we have used a combination of the above mentioned approaches for the purpose. We have used connected component analysis (CCA), in binarized images, to segment non-text areas based on the size information of the connected regions. A Gabor function based filter bank is used to separate the text and the non-text areas of comparable size. The technique is shown to work efficiently on different kinds of scanned document images, camera captured document images and sometimes on scenic images.