Text extraction using component analysis and neuro-fuzzy classification on complex backgrounds

Authors:
Michael Makridis;Nikolaos E. Mitrakis;Nikolaos Nikolaou;Nikolaos Papamarkos
Affiliations:
Image Processing and Multimedia Laboratory, Department of Electrical & ComputerEngineering, Democritus University of Thrace, Xanthi, Greece;European Commission, Joint Research Centre, Institute for Protection and Security of the Citizen, Ispra, VA, Italy;Image Processing and Multimedia Laboratory, Department of Electrical & ComputerEngineering, Democritus University of Thrace, Xanthi, Greece;Image Processing and Multimedia Laboratory, Department of Electrical & ComputerEngineering, Democritus University of Thrace, Xanthi, Greece
Venue:
SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
Year:
2011

Citing 7
Cited 0

A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
TextFinder: An Automatic System to Detect and Recognize Text In Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths

Image and Vision Computing
Page segmentation using texture analysis

Pattern Recognition
Color text image binarization based on binary texture analysis

Pattern Recognition Letters
Extracting text information for content-based video retrieval

MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling
A comprehensive method for multilingual video text detection, localization, and extraction

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new technique for text extraction on complex color documents and cover books. The novelty of the proposed technique is that contrary to many existing techniques, it has been designed to deal successfully with documents having complex background, character size variations and different fonts. The number of colors of each document image is reduced automatically into a relative small number (usually below ten colors) and each document is divided into binary images. Then, connected component analysis is performed and homogenous groups of connected components (CCs) are created. A set of features is extracted for each group of CCs. Finally each group is classified into text or non-text classes using a neuro-fuzzy classifier. The proposed technique can be summarized into four consequent stages. In the first stage, a pre-processing algorithm filters noisy CCs. Afterwards, CC grouping is performed. Then, a set of nine local and global features is extracted for each group and finally a classification procedure detects document's text regions. Experimental results prove the efficiency of the proposed technique, which can be further extended to deal with even more complex text extraction problems.