Passage-Based Document Retrieval as a Tool for Text Mining with User's Information Needs

  • Authors:
  • Koichi Kise;Markus Junker;Andreas Dengel;Keinosuke Matsumoto

  • Affiliations:
  • -;-;-;-

  • Venue:
  • DS '01 Proceedings of the 4th International Conference on Discovery Science
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document retrieval can be considered as a basic but important tool for text mining that is capable of taking a user's information need into account. However, document retrieval is a hard task if multitopic lengthy documents have to be retrieved with a very short description (a few keywords) of the information need. In this paper, we focus on this problem which is typical in real world applications. We experimentally validate that passage-based document retrieval is advantageous in such circumstances as compared to conventional document retrieval. Passage-based document retrieval is a kind of document retrieval which takes into account only small fractions (passages) of documents to judge the document relevance to the information need. As a passage-based method, we employ the method based on density distributions of keywords. This is compared with the following three conventional methods for document retrieval: the vector space model, pseudo-feedback, and latent semantic indexing. Experimental results show that the passage-based method is superior to the conventional methods if long documents have to be retrieved by short queries.