Practical methods of optimization; (2nd ed.)
Practical methods of optimization; (2nd ed.)
STARTS: Stanford proposal for Internet meta-searching
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Effective retrieval with distributed collections
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
GlOSS: text-source discovery over the Internet
ACM Transactions on Database Systems (TODS)
Extended Boolean information retrieval
Communications of the ACM
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Approaches to collection selection and results merging for distributed information retrieval
Proceedings of the tenth international conference on Information and knowledge management
Experiments on data fusion using headline information
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Relevant document distribution estimation method for resource selection
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A semisupervised learning method to merge search engine results
ACM Transactions on Information Systems (TOIS)
Combining the language model and inference network approaches to retrieval
Information Processing and Management: an International Journal - Special issue: Bayesian networks and information retrieval
Computing pagerank in a distributed internet search system
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Robust result merging using sample-based score estimates
ACM Transactions on Information Systems (TOIS)
Classification-based resource selection
Proceedings of the 18th ACM conference on Information and knowledge management
Central-rank-based collection selection in uncooperative distributed information retrieval
ECIR'07 Proceedings of the 29th European conference on IR research
Document allocation policies for selective searching of distributed indexes
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A weighted curve fitting method for result merging in federated search
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning a merge model for multilingual information retrieval
Information Processing and Management: an International Journal
The linear combination data fusion method in information retrieval
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
A multi-collection latent topic model for federated search
Information Retrieval
Incorporating vertical results into search click models
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
Result merging is an important research problem in federated search for merging documents retrieved from multiple ranked lists of selected information sources into a single list. The state-of-the-art result merging algorithms such as Semi-Supervised Learning (SSL) and Sample-Agglomerate Fitting Estimate (SAFE) try to map document scores retrieved from different sources to comparable scores according to a single centralized retrieval algorithm for ranking those documents. Both SSL and SAFE arbitrarily select a single centralized retrieval algorithm for generating comparable document scores, which is problematic in a heterogeneous federated search environment, since a single centralized algorithm is often suboptimal for different information sources. Based on this observation, this paper proposes a novel approach for result merging by utilizing multiple centralized retrieval algorithms. One simple approach is to learn a set of combination weights for multiple centralized retrieval algorithms (e.g., logistic regression) to compute comparable document scores. The paper shows that this simple approach generates suboptimal results as it is not flexible enough to deal with heterogeneous information sources. A mixture probabilistic model is thus proposed to learn more appropriate combination weights with respect to different types of information sources with some training data. An extensive set of experiments on three datasets have proven the effectiveness of the proposed new approach.