The indexing and retrieval of document images: a survey
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Imaged Document Text Retrieval Without OCR
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Keyword Spotting System of Korean Document Images
ICADL '02 Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology
Retrieval methods for English-text with missrecognized OCR characters
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Word Searching in Document Images Using Word Portion Matching
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
A comparison of discrete and continuous hidden Markov models for phrase spotting in text images
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
A search engine for imaged documents in PDF files
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Word spotting in scanned images using hidden Markov models
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: image and multidimensional signal processing - Volume V
A document image preprocessing system for keyword spotting
ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
Text extraction for spam-mail image filtering using a text color estimation technique
IEA/AIE'07 Proceedings of the 20th international conference on Industrial, engineering, and other applications of applied intelligent systems
Hi-index | 0.00 |
In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.