Passage-Based Document Retrieval as a Tool for Text Mining with User's Information Needs

Authors:
Koichi Kise;Markus Junker;Andreas Dengel;Keinosuke Matsumoto
Affiliations:
-;-;-;-
Venue:
DS '01 Proceedings of the 4th International Conference on Discovery Science
Year:
2001

Citing 9
Cited 3

Using statistical testing in the evaluation of retrieval experiments

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text decomposition using text segments and text themes

Proceedings of the the seventh ACM conference on Hypertext
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Effective document presentation with a locality-based similarity heuristic

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
On the Use of Density Distribution of Keywords for Automated Generation of Hypertext Links from Arbitrary Parts of Documents

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Untangling text data mining

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Collaborative Learning of Term-Based Concepts for Automatic Query Expansion

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Mining Documents for Complex Semantic Relations by the Use of Context Classification

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document retrieval can be considered as a basic but important tool for text mining that is capable of taking a user's information need into account. However, document retrieval is a hard task if multitopic lengthy documents have to be retrieved with a very short description (a few keywords) of the information need. In this paper, we focus on this problem which is typical in real world applications. We experimentally validate that passage-based document retrieval is advantageous in such circumstances as compared to conventional document retrieval. Passage-based document retrieval is a kind of document retrieval which takes into account only small fractions (passages) of documents to judge the document relevance to the information need. As a passage-based method, we employ the method based on density distributions of keywords. This is compared with the following three conventional methods for document retrieval: the vector space model, pseudo-feedback, and latent semantic indexing. Experimental results show that the passage-based method is superior to the conventional methods if long documents have to be retrieved by short queries.