Imaged Document Text Retrieval Without OCR
IEEE Transactions on Pattern Analysis and Machine Intelligence
Information Retrieval in Document Image Databases
IEEE Transactions on Knowledge and Data Engineering
Document Image Retrieval Based on Density Distribution Feature and Key Block Feature
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Feature string-based intelligent information retrieval from Tamil document images
International Journal of Computer Applications in Technology
An indexed full-text search method of printed document images with an M-tree
RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
Keyword spotting in unconstrained handwritten Chinese documents using contextual word model
Image and Vision Computing
Hi-index | 0.00 |
In Chinese information retrieval, it is easy to index a Chinese text document for retrieval. We just need to segment the text document into phrases. When the document is Chinese document image (non-ASCII file), we may first convert the document image into text file by using Chinese optical character recognition (OCR) technology, and then index the document by using information retrieval algorithm. However, OCR needs longer time, which can influence retrieval efficiency. This paper proposes an index method based on stroke density code. First segment the document image to get all the Chinese character images, then calculate stroke density of each Chinese character image, and at last attain stroke density code of the character image. The index method has the advantage of speed and robustness to noise. In addition, this paper also offers retrieval method for Chinese document image based on the index technology. We specially discuss index and retrieval method for duplicate detection. We have validated the validity of the index method through its application to keyword spotting and duplicate detection.