A knowledge-based system for extracting text-lines from mixed and overlapping text/graphics compound document images

  • Authors:
  • Yen-Lin Chen;Zeng-Wei Hong;Cheng-Hung Chuang

  • Affiliations:
  • Department of Computer Science and Information Engineering, National Taipei University of Technology, 1, Sec. 3, Chung-hsiao E. Rd., Taipei 10608, Taiwan;Department of Computer Science and Information Engineering, Asia University, 500 Liufeng Rd., Wufeng, Taichung 41354, Taiwan;Department of Computer Science and Information Engineering, Asia University, 500 Liufeng Rd., Wufeng, Taichung 41354, Taiwan

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

This paper presents a new knowledge-based system for extracting and identifying text-lines from various real-life mixed text/graphics compound document images. The proposed system first decomposes the document image into distinct object planes to separate homogeneous objects, including textual regions of interest, non-text objects such as graphics and pictures, and background textures. A knowledge-based text extraction and identification method obtains the text-lines with different characteristics in each plane. The proposed system offers high flexibility and expandability by merely updating new rules to cope with various types of real-life complex document images. Experimental and comparative results prove the effectiveness of the proposed knowledge-based system and its advantages in extracting text-lines with a large variety of illumination levels, sizes, and font styles from various types of mixed and overlapping text/graphics complex compound document images.