Research on born-digital image text extraction based on conditional random field

Authors:
Zhang Jian;Cheng RenHong;Wang Kai;Zhao Hong
Affiliations:
College of Computer and Control Engineering, Nankai University, 94#, Weijin Road, Tianjin, China;College of Software, Nankai University 94#, Weijin Road, Tianjin, China;College of Computer and Control Engineering, Nankai University, 94#, Weijin Road, Tianjin, China;College of Computer and Control Engineering, Nankai University, 94#, Weijin Road, Tianjin, China
Venue:
International Journal of High Performance Systems Architecture
Year:
2014

Citing 16
Cited 0

Ten lectures on wavelets

Ten lectures on wavelets
Twenty Years of Document Image Analysis in PAMI

IEEE Transactions on Pattern Analysis and Machine Intelligence
Goal-Directed Evaluation of Binarization Methods

IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Text Detection in Images Based on Unsupervised Classification of High-Frequency Wavelet Coefficients

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Segmentation of Chinese Postal Envelope Images for Address Block Location

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II
Fuzzy keyword search over encrypted data in cloud computing

INFOCOM'10 Proceedings of the 29th conference on Information communications
Text Extraction from Video Using Conditional Random Fields

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Text Localization in Real-World Images Using Efficiently Pruned Exhaustive Search

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
An Improved Scene Text Extraction Method Using Conditional Random Field and Optical Character Recognition

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Improving Scene Text Detection by Scale-Adaptive Segmentation and Weighted CRF Verification

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Semantic Image and Video Indexing in Broad Domains

IEEE Transactions on Multimedia
De-noising by soft-thresholding

IEEE Transactions on Information Theory
Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model

IEEE Transactions on Image Processing
A Hybrid Approach to Detect and Localize Texts in Natural Scene Images

IEEE Transactions on Image Processing
A comprehensive method for multilingual video text detection, localization, and extraction

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.