Testing the cluster hypothesis in distributed information retrieval

Authors:
Fabio Crestani;Shengli Wu
Affiliations:
Department of Computer and Information Sciences, University of Strathclyde, Glasgow, UK;School of Computing and Mathematics, University of Ulster, Belfast, UK
Venue:
Information Processing and Management: an International Journal
Year:
2006

Citing 21
Cited 5

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Full text information processing using the smart system

Data Engineering
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Experiences with selecting search engines using metasearch

ACM Transactions on Information Systems (TOIS)
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection

ACM Transactions on Information Systems (TOIS)
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems (TOIS)
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Database merging strategy based on logistic regression

Information Processing and Management: an International Journal
Evaluating document clustering for interactive information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Merging techniques for performing data fusion on the web

Proceedings of the tenth international conference on Information and knowledge management
Using sampled data and regression to merge search engine results

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Fusion Via a Linear Combination of Scores

Information Retrieval
The effectiveness of query-specific hierarchic clustering in information retrieval

Information Processing and Management: an International Journal
Determining Text Databases to Search in the Internet

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Server Ranking for Distributed Text Retrieval Systems on the Internet

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval

The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval
Distributed information retrieval: a multi-objective resource selection approach

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems - Intelligent information systems

Retrieval result presentation and evaluation

KSEM'10 Proceedings of the 4th international conference on Knowledge science, engineering and management
Cluster-based fusion of retrieved lists

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
From "identical" to "similar": fusing retrieved lists based on inter-document similarities

Journal of Artificial Intelligence Research
The opposite of smoothing: a language model approach to ranking query-specific document clusters

Journal of Artificial Intelligence Research
Utilizing inter-document similarities in federated search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.