Revisit of nearest neighbor test for direct evaluation of inter-document similarities

Authors:
Seung-Hoon Na;In-Su Kang;Jong-Hyeok Lee
Affiliations:
POSTECH, Pohang, South Korea;KISTI, Daejeon, South Korea;POSTECH, Pohang, South Korea
Venue:
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Year:
2008

Citing 8
Cited 1

Representing documents using an explicit model of their similarities

Journal of the American Society for Information Science
The cluster hypothesis revisited

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Query-sensitive similarity measures for the calculation of interdocument relationships

Proceedings of the tenth international conference on Information and knowledge management
Information Retrieval

Information Retrieval
Parsimonious language models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Corpus structure, language models, and ad hoc information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Link-based similarity measures for the classification of Web documents

Journal of the American Society for Information Science and Technology
Language model information retrieval with document expansion

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

Exploring the cluster hypothesis, and cluster-based retrieval, over the web

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, cluster-based retrieval has been successfully applied to improve retrieval effectiveness. The core part of cluster-based retrieval is interdocument similarities. Although inter-document similarities can be investigated independently of cluster-based retrieval and be further improved in various ways, their direct evaluation has not been seriously considered. Considering that there are many cluster-based retrieval methods, such a direct evaluation method can separate the work of inter-document similarities from the work of cluster-based retrieval. For this purpose, this paper revisits Voorhee's nearest neighbor test as such a direct evaluation, by mainly focusing on whether or not the test is correlated to the retrieval effectiveness. Experimental results consistently verify the use of the nearest neighbor test. As a result, we conclude that the improvement of retrieval effectiveness can be well-predictable from direct evaluation, even without performing runs of cluster-based retrieval.