A semisupervised learning method to merge search engine results

Authors:
Luo Si;Jamie Callan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2003

Citing 26
Cited 51

Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The effectiveness of GIOSS for the text database discovery problem

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
TREC and TIPSTER experiments with INQUERY

TREC-2 Proceedings of the second conference on Text retrieval conference
Dissemination of collection wide information in a distributed information retrieval system

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Learning collection fusion strategies

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
STARTS: Stanford proposal for Internet meta-searching

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Effective retrieval with distributed collections

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Methods for information server selection

ACM Transactions on Information Systems (TOIS)
Comparing the performance of database selection algorithms

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Cluster-based language models for distributed retrieval

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic approach to database selection in networked IR

ACM Transactions on Information Systems (TOIS)
GlOSS: text-source discovery over the Internet

ACM Transactions on Database Systems (TODS)
Server selection on the World Wide Web

DL '00 Proceedings of the fifth ACM conference on Digital libraries
The impact of database selection on distributed searching

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Database merging strategy based on logistic regression

Information Processing and Management: an International Journal
Collection selection and results merging with topically organized U.S. patents and TREC data

Proceedings of the ninth international conference on Information and knowledge management
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
Modeling score distributions for combining the outputs of search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Using sampled data and regression to merge search engine results

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A language modeling framework for resource selection and results merging

Proceedings of the eleventh international conference on Information and knowledge management
Distributed search over the hidden web: hierarchical database sampling and selection

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Merging retrieval results in hierarchical peer-to-peer networks

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Merging Results for Distributed Content Based Image Retrieval

Multimedia Tools and Applications
Unified utility maximization framework for resource selection

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Modeling search engine effectiveness for federated search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Collaborative research - digital government: a language modeling approach to metadata for cross-database linkage and search

dg.o '04 Proceedings of the 2004 annual national conference on Digital government research
Distributed text retrieval from overlapping collections

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Federated text retrieval from uncooperative overlapped collections

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Updating collection representations for federated search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The DILIGENT framework for distributed information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid results merging

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase

Information Processing and Management: an International Journal
Enhancing web search by promoting multiple search engine use

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A study of learning a merge model for multilingual information retrieval

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Integral based source selection for uncooperative distributed information retrieval environments

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
Joint Ranking for Multilingual Web Search

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Simple Adaptations of Data Fusion Algorithms for Source Selection

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
SUSHI: scoring scaled samples for server selection

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Effective query expansion for federated search

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Server selection methods in personal metasearch: a comparative empirical study

Information Retrieval
Mutli-agent System for Personalizing Information Source Selection

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Learning from past queries for resource selection

Proceedings of the 18th ACM conference on Information and knowledge management
A study of a weighting scheme for information retrieval in hierarchical peer-to-peer networks

ECIR'07 Proceedings of the 29th European conference on IR research
A decision-theoretic model for decentralised query routing in hierarchical peer-to-peer networks

ECIR'07 Proceedings of the 29th European conference on IR research
Central-rank-based collection selection in uncooperative distributed information retrieval

ECIR'07 Proceedings of the 29th European conference on IR research
Results merging algorithm using multiple regression models

ECIR'07 Proceedings of the 29th European conference on IR research
Segmentation of search engine results for effective data-fusion

ECIR'07 Proceedings of the 29th European conference on IR research
Semi-supervised document classification with a mislabeling error model

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Collection-integral source selection for uncooperative distributed information retrieval environments

Information Sciences: an International Journal
Personalised distributed information retrieval-based agents

International Journal of Intelligent Systems Technologies and Applications
A joint probabilistic classification model for resource selection

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
LESSON: A system for lecture notes searching and sharing over Internet

Journal of Systems and Software
Modeling information sources as integrals for effective and efficient source selection

Information Processing and Management: an International Journal
A profile-based aggregation model in a peer-to-peer information retrieval system

Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
PISA: A framework for integrating uncooperative peers into P2P-based federated search

Computer Communications
Federated Search

Foundations and Trends in Information Retrieval
A weighted curve fitting method for result merging in federated search

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Learning a merge model for multilingual information retrieval

Information Processing and Management: an International Journal
Integrating Fusion Techniques into the Collaborative Filtering Search-Based Recommender Systems

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
A fuzzy integral method to merge search engine results on web

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part II
CLEF 2005: multilingual retrieval by combining multiple multilingual ranked lists

CLEF'05 Proceedings of the 6th international conference on Cross-Language Evalution Forum: accessing Multilingual Information Repositories
Federated search of text-based digital libraries in hierarchical peer-to-peer networks

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Peer-to-Peer Information Retrieval: An Overview

ACM Transactions on Information Systems (TOIS)
Comparing different architectures for query routing in peer-to-peer networks

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Mixture model with multiple centralized retrieval algorithms for result merging in federated search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Utilizing inter-document similarities in federated search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
A grid-based infrastructure for distributed retrieval

ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Studying the clustering paradox and scalability of search in highly distributed environments

ACM Transactions on Information Systems (TOIS)
Search result diversification in resource selection for federated search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Integrating collaborative filtering and matching-based search for product recommendations

Journal of Theoretical and Applied Electronic Commerce Research
Merging algorithms for enterprise search

Proceedings of the 18th Australasian Document Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

The proliferation of searchable text databases on local area networks and the Internet causes the problem of finding information that may be distributed among many disjoint text databases (distributed information retrieval). How to merge the results returned by selected databases is an important subproblem of the distributed information retrieval task. Previous research assumed that either resource providers cooperate to provide normalizing statistics or search clients download all retrieved documents and compute normalized scores without cooperation from resource providers.This article presents a semisupervised learning solution to the result merging problem. The key contribution is the observation that information used to create resource descriptions for resource selection can also be used to create a centralized sample database to guide the normalization of document scores returned by different databases. At retrieval time, the query is sent to the selected databases, which return database-specific document scores, and to a centralized sample database, which returns database-independent document scores. Documents that have both a database-specific score and a database-independent score serve as training data for learning to normalize the scores of other documents. An extensive set of experiments demonstrates that this method is more effective than the well-known CORI result-merging algorithm under a variety of conditions.