The impact on retrieval effectiveness of skewed frequency distributions
ACM Transactions on Information Systems (TOIS)
Document image retrieval without OCRing using a video scanning system
MULTIMEDIA '00 Proceedings of the 2000 ACM workshops on Multimedia
Information Retrieval from Documents: A Survey
Information Retrieval
Group 4 Compressed Document Matching
DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
Spotting Where to Read on Pages - Retrieval of Relevant Parts from Page Images
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Indexing and retrieval of words in old documents
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A search engine for imaged documents in PDF files
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval in Document Image Databases
IEEE Transactions on Knowledge and Data Engineering
Camera-Based Document Image Retrieval as Voting for Partial Signatures of Projective Invariants
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Retrieval of machine-printed Latin documents through Word Shape Coding
Pattern Recognition
Retrieval of machine-printed Latin documents through Word Shape Coding
Pattern Recognition
Robust image based document comparison using attributed relational graphs
SPPRA '08 Proceedings of the Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications
A survey of keyword spotting techniques for printed document images
Artificial Intelligence Review
Hi-index | 0.00 |
In conventional information retrieval the task of finding users' search terms in a document is simple. When the document is not available in machine-readable format, optical character recognition (OCR) can usually be performed. We have developed a technique for performing information retrieval on document images in such a manner that the accuracy has great utility. The method makes generalisations about the images of characters, then performs classification of these and agglomerates the resulting character shape codes into word tokens based on character shape coding. These are sufficiently specific in their representation of the underlying words to allow reasonable performance of retrieval. Using a collection of over 250 Mbytes of document texts and queries with known relevance assessments, we present a series of experiments to determine how various parameters in the retrieval strategy affect retrieval performance and we obtain a surprisingly good results.