Research on born-digital image text extraction based on conditional random field

  • Authors:
  • Zhang Jian;Cheng RenHong;Wang Kai;Zhao Hong

  • Affiliations:
  • College of Computer and Control Engineering, Nankai University, 94#, Weijin Road, Tianjin, China;College of Software, Nankai University 94#, Weijin Road, Tianjin, China;College of Computer and Control Engineering, Nankai University, 94#, Weijin Road, Tianjin, China;College of Computer and Control Engineering, Nankai University, 94#, Weijin Road, Tianjin, China

  • Venue:
  • International Journal of High Performance Systems Architecture
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components CCs from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field CRF - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.