Connected components in binary images: the detection problem
Connected components in binary images: the detection problem
Evaluation of Binarization Methods for Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
The indexing and retrieval of document images: a survey
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Summarization of imaged documents without OCR
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Imaged Document Text Retrieval Without OCR
IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Information Retrieval from Documents: A Survey
Information Retrieval
Document Ranking and the Vector-Space Model
IEEE Software
Document image similarity and equivalence detection
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Using Character Shape Coding for Information Retrieval
ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Text Retrieval from Document Images Based on Word Shape Analysis
Applied Intelligence
Modeling content identification from document images
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Content-oriented categorization of document images
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Information Retrieval in Document Image Databases
IEEE Transactions on Knowledge and Data Engineering
Script and Language Identification in Noisy and Degraded Document Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Script and language identification in degraded and distorted document images
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Adaptive document block segmentation and classification
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Adaptive median filters: new algorithms and results
IEEE Transactions on Image Processing
Document seal detection using GHT and character proximity graphs
Pattern Recognition
Hi-index | 0.01 |
This paper reports a document retrieval technique that retrieves machine-printed Latin-based document images through word shape coding. Adopting the idea of image annotation, a word shape coding scheme is proposed, which converts each word image into a word shape code by using a few shape features. The text contents of imaged documents are thus captured by a document vector constructed with the converted word shape code and word frequency information. Similarities between different document images are then gauged based on the constructed document vectors. We divide the retrieval process into two stages. Based on the observation that documents of the same language share a large number of high-frequency language-specific stop words, the first stage retrieves documents with the same underlying language as that of the query document. The second stage then re-ranks the documents retrieved in the first stage based on the topic similarity. Experiments show that document images of different languages and topics can be retrieved properly by using the proposed word shape coding scheme.