Representing documents using an explicit model of their similarities
Journal of the American Society for Information Science
The cluster hypothesis revisited
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Query-sensitive similarity measures for the calculation of interdocument relationships
Proceedings of the tenth international conference on Information and knowledge management
Information Retrieval
Parsimonious language models for information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Link-based similarity measures for the classification of Web documents
Journal of the American Society for Information Science and Technology
Language model information retrieval with document expansion
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Exploring the cluster hypothesis, and cluster-based retrieval, over the web
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Recently, cluster-based retrieval has been successfully applied to improve retrieval effectiveness. The core part of cluster-based retrieval is interdocument similarities. Although inter-document similarities can be investigated independently of cluster-based retrieval and be further improved in various ways, their direct evaluation has not been seriously considered. Considering that there are many cluster-based retrieval methods, such a direct evaluation method can separate the work of inter-document similarities from the work of cluster-based retrieval. For this purpose, this paper revisits Voorhee's nearest neighbor test as such a direct evaluation, by mainly focusing on whether or not the test is correlated to the retrieval effectiveness. Experimental results consistently verify the use of the nearest neighbor test. As a result, we conclude that the improvement of retrieval effectiveness can be well-predictable from direct evaluation, even without performing runs of cluster-based retrieval.