Keyword spotting on korean document images by matching the keyword image

Authors:
Soo Hyung Kim;Sang Cheol Park;Chang Bu Jeong;Ji Soo Kim;Hyuk Ro Park;Guee Sang Lee
Affiliations:
Department of Computer Science, Chonnam National University, Kwangju, Korea;Department of Computer Science, Chonnam National University, Kwangju, Korea;Department of Internet Software, Honam University, Kwangju, Korea;Department of Computer Science, Chonnam National University, Kwangju, Korea;Department of Computer Science, Chonnam National University, Kwangju, Korea;Department of Computer Science, Chonnam National University, Kwangju, Korea
Venue:
ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Year:
2005

Citing 9
Cited 1

The indexing and retrieval of document images: a survey

Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Imaged Document Text Retrieval Without OCR

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Keyword Spotting System of Korean Document Images

ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Retrieval methods for English-text with missrecognized OCR characters

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Word Searching in Document Images Using Word Portion Matching

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A comparison of discrete and continuous hidden Markov models for phrase spotting in text images

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
A search engine for imaged documents in PDF files

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Word spotting in scanned images using hidden Markov models

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: image and multidimensional signal processing - Volume V
A document image preprocessing system for keyword spotting

ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization

Text extraction for spam-mail image filtering using a text color estimation technique

IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.