Spotting Where to Read on Pages - Retrieval of Relevant Parts from Page Images

Authors:
Koichi Kise;Masaaki Tsujino;Keinosuke Matsumoto
Affiliations:
-;-;-
Venue:
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Year:
2002

Citing 9
Cited 3

Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text decomposition using text segments and text themes

Proceedings of the the seventh ACM conference on Hypertext
The indexing and retrieval of document images: a survey

Computer Vision and Image Understanding - Special issue on document image understanding and retrieval
BMIR-J2: a test collection for evaluation of Japanese information retrieval systems

ACM SIGIR Forum
Modern Information Retrieval

Modern Information Retrieval
Using Character Shape Coding for Information Retrieval

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Retrieval methods for English-text with missrecognized OCR characters

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
On the Use of Density Distribution of Keywords for Automated Generation of Hypertext Links from Arbitrary Parts of Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Experimental Evaluation of Passage-Based Document Retrieval

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition

Graphics Recognition - from Re-engineering to Retrieval

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Document Image Retrieval Based on 2D Density Distributions of Terms with Pseudo Relevance Feedback

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Efficient word retrieval by means of SOM clustering and PCA

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new method of document image retrieval that is capable of spotting parts of page images relevant to a user's query. This enables us to improve the usability of retrieval, since a user can find where to read on retrieved pages. The effectiveness of retrieval can also be improved because the method is little influenced by irrelevant parts on pages. The method is based on the assumption that parts of page images which densely contain keywords in a query are relevant to it. The characteristics of the proposed method are as follows: (1) Two-dimensional density distributions of keywords are calculated for ranking parts of page images, (2) The method relies only on the distribution of characters so as not to be affected by the errors of layout analysis. Based on the experimental results of retrieving Japanese newspaper articles, we have shown that the proposed method is superior to a method without the function of dealing with parts, and sometimes equivalent to a method of electronic document retrieval that works on error-free text.