A document image preprocessing system for keyword spotting

  • Authors:
  • C. B. Jeong;S. H. Kim

  • Affiliations:
  • Department of Computer Science, Chonnam National University, Gwangju, Korea;Department of Computer Science, Chonnam National University, Gwangju, Korea

  • Venue:
  • ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a system for the segmentation of a printed document image into word images, which can be used effectively for document image retrieval based on keyword spotting. The system is composed of three image manipulation modules: skew correction, document layout analysis, and word segmentation. To enhance the practical applicability and flexibility of our research results, we test the system with 50 images of Korean papers and 50 images of English papers provided through full-text image retrieval services by the Korea Information Science Society and the Pattern Recognition Society, respectively. Currently, the accuracy of word extraction ranges from 90 to 95%, depending on the language of the document.