Evaluation of model-based retrieval effectiveness with OCR text
ACM Transactions on Information Systems (TOIS)
The indexing and retrieval of document images: a survey
Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
Modern Information Retrieval
Spotting Where to Read on Pages - Retrieval of Relevant Parts from Page Images
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Document Image Layout Comparison and Classification
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Experimental Evaluation of Passage-Based Document Retrieval
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
Document image retrieval is a task to retrieve documentimages relevant to a user's query. Most of existing methodsbased on word-level indexing rely on the representationcalled "bag of words" which originated in the field of informationretrieval. This paper presents a new representationof documents that utilizes additional information about thelocation of words in pages so as to improve the retrieval performance.We consider that pages are relevant to a queryif they contains its terms densely. This notion is embodiedas density distributions of terms calculated in the proposedmethod. Its performance is improved with the helpof "pseudo relevance feedback", i.e., a method of expandinga query by analyzing pages. Experimental results onEnglish document images show that the proposed methodis superior to conventional methods of electronic documentretrieval at recall levels 0.0-0.6.