Techniques for the measurement of clustering tendency in document retrieval systems
Journal of Information Science
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The cluster hypothesis revisited
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Relevance based language models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A study of smoothing methods for language models applied to Ad Hoc information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
The effectiveness of query-specific hierarchic clustering in information retrieval
Information Processing and Management: an International Journal
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A Markov random field model for term dependencies
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Document re-ranking using cluster validation and label propagation
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
The opposite of smoothing: a language model approach to ranking query-specific document clusters
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Clusters, language models, and ad hoc information retrieval
ACM Transactions on Information Systems (TOIS)
Re-ranking search results using language models of query-specific clusters
Information Retrieval
A New Measure of the Cluster Hypothesis
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Evaluating text representations for retrieval of the best group of documents
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Revisit of nearest neighbor test for direct evaluation of inter-document similarities
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Geometric representations for multiple documents
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
We present a study of the cluster hypothesis, and of the performance of cluster-based retrieval methods, performed over large scale Web collections. Among the findings we present are (i) the cluster hypothesis can hold, as determined by a specific test, for large scale Web corpora to the same extent it does for newswire corpora; (ii) while spam documents do not affect the extent to which the cluster hypothesis holds, they considerably affect the performance of cluster based, as well as that of document-based, retrieval methods; and, (iii) as is the case for newswire corpora, cluster-based methods can yield better performance than document-based methods for Web corpora.