The indexing and retrieval of document images: a survey
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Information Retrieval from Documents: A Survey
Information Retrieval
UpLib: a universal personal digital library system
Proceedings of the 2003 ACM symposium on Document engineering
A search engine for imaged documents in PDF files
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Camera-Based Document Image Retrieval as Voting for Partial Signatures of Projective Invariants
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Distance Measures for Layout-Based Document Image Retrieval
DIAL '06 Proceedings of the Second International Conference on Document Image Analysis for Libraries
Image retrieval: Ideas, influences, and trends of the new age
ACM Computing Surveys (CSUR)
Document page retrieval based on geometric layout features
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
In this paper we propose a schema for querying large documents collections by document layout. We develop a model of layout indexing of a collection adapted for the quick retrieval of top k relevant documents. Fort the sake of scalability, we avoid a direct evaluation of the similarity between a query and each document in the collection; their similarity is instead approximated by the similarity between their projections on the set of representative blocks which are inferred from the collection on the indexed step. The technique also proposes new functions for the relevance ranking and the cluster pruning that ensure a scalable retrieval and ranking.