Mixture model with multiple centralized retrieval algorithms for result merging in federated search

Authors:
Dzung Hong;Luo Si
Affiliations:
Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA
Venue:
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Year:
2012

Citing 21
Cited 1

Practical methods of optimization; (2nd ed.)

Practical methods of optimization; (2nd ed.)
STARTS: Stanford proposal for Internet meta-searching

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Extended Boolean information retrieval

Communications of the ACM
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Approaches to collection selection and results merging for distributed information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Experiments on data fusion using headline information

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A semisupervised learning method to merge search engine results

ACM Transactions on Information Systems (TOIS)
Combining the language model and inference network approaches to retrieval

Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Computing pagerank in a distributed internet search system

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
Classification-based resource selection

Proceedings of the 18th ACM conference on Information and knowledge management
Central-rank-based collection selection in uncooperative distributed information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Document allocation policies for selective searching of distributed indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A weighted curve fitting method for result merging in federated search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning a merge model for multilingual information retrieval

Information Processing and Management: an International Journal
The linear combination data fusion method in information retrieval

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
A multi-collection latent topic model for federated search

Information Retrieval

Incorporating vertical results into search click models

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Result merging is an important research problem in federated search for merging documents retrieved from multiple ranked lists of selected information sources into a single list. The state-of-the-art result merging algorithms such as Semi-Supervised Learning (SSL) and Sample-Agglomerate Fitting Estimate (SAFE) try to map document scores retrieved from different sources to comparable scores according to a single centralized retrieval algorithm for ranking those documents. Both SSL and SAFE arbitrarily select a single centralized retrieval algorithm for generating comparable document scores, which is problematic in a heterogeneous federated search environment, since a single centralized algorithm is often suboptimal for different information sources. Based on this observation, this paper proposes a novel approach for result merging by utilizing multiple centralized retrieval algorithms. One simple approach is to learn a set of combination weights for multiple centralized retrieval algorithms (e.g., logistic regression) to compute comparable document scores. The paper shows that this simple approach generates suboptimal results as it is not flexible enough to deal with heterogeneous information sources. A mixture probabilistic model is thus proposed to learn more appropriate combination weights with respect to different types of information sources with some training data. An extensive set of experiments on three datasets have proven the effectiveness of the proposed new approach.