A document image preprocessing system for keyword spotting

Authors:
C. B. Jeong;S. H. Kim
Affiliations:
Department of Computer Science, Chonnam National University, Gwangju, Korea;Department of Computer Science, Chonnam National University, Gwangju, Korea
Venue:
ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
Year:
2004

Citing 1
Cited 3

The indexing and retrieval of document images: a survey

Computer Vision and Image Understanding - Special issue on document image understanding and retrieval

Text image matching without language model using a Hausdorff distance

Information Processing and Management: an International Journal
Keyword spotting on korean document images by matching the keyword image

ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Word extraction from table regions in document images

ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a system for the segmentation of a printed document image into word images, which can be used effectively for document image retrieval based on keyword spotting. The system is composed of three image manipulation modules: skew correction, document layout analysis, and word segmentation. To enhance the practical applicability and flexibility of our research results, we test the system with 50 images of Korean papers and 50 images of English papers provided through full-text image retrieval services by the Korea Information Science Society and the Pattern Recognition Society, respectively. Currently, the accuracy of word extraction ranges from 90 to 95%, depending on the language of the document.