Using statistical testing in the evaluation of retrieval experiments
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic text decomposition using text segments and text themes
Proceedings of the the seventh ACM conference on Hypertext
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Effective document presentation with a locality-based similarity heuristic
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Collaborative Learning of Term-Based Concepts for Automatic Query Expansion
ECML '02 Proceedings of the 13th European Conference on Machine Learning
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Mining Documents for Complex Semantic Relations by the Use of Context Classification
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Hi-index | 0.00 |
Document retrieval can be considered as a basic but important tool for text mining that is capable of taking a user's information need into account. However, document retrieval is a hard task if multitopic lengthy documents have to be retrieved with a very short description (a few keywords) of the information need. In this paper, we focus on this problem which is typical in real world applications. We experimentally validate that passage-based document retrieval is advantageous in such circumstances as compared to conventional document retrieval. Passage-based document retrieval is a kind of document retrieval which takes into account only small fractions (passages) of documents to judge the document relevance to the information need. As a passage-based method, we employ the method based on density distributions of keywords. This is compared with the following three conventional methods for document retrieval: the vector space model, pseudo-feedback, and latent semantic indexing. Experimental results show that the passage-based method is superior to the conventional methods if long documents have to be retrieved by short queries.