Generative model-based metasearch for data fusion in information retrieval

Authors:
Miles Efron
Affiliations:
University of Texas, Austin, TX, USA
Venue:
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Year:
2009

Citing 26
Cited 5

A combination of expert opinion approach to probabilistic information retrieval, part 1: The conceptual model

Information Processing and Management: an International Journal
The effect multiple query representations on information retrieval system performance

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the evidence of multiple query representations for information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
The impact of database selection on distributed searching

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Collection selection and results merging with topically organized U.S. patents and TREC data

Proceedings of the ninth international conference on Information and knowledge management
The open archives initiative: building a low-barrier interoperability framework

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Ranking retrieval systems without relevance judgments

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modeling score distributions for combining the outputs of search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Condorcet fusion for improved retrieval

Proceedings of the eleventh international conference on Information and knowledge management
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the effectiveness of evaluating retrieval systems in the absence of relevance judgments

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Methods for ranking information retrieval systems without relevance judgments

Proceedings of the 2003 ACM symposium on Applied computing
Full-text federated search of text-based digital libraries in peer-to-peer networks

Information Retrieval
ProbFuse: a probabilistic approach to data fusion

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The polyrepresentation continuum in IR

IIiX Proceedings of the 1st international conference on Information interaction in context
Using the structure of overlap between search results to rank retrieval systems without relevance judgments

Information Processing and Management: an International Journal
Eliciting better information need descriptions from users of information search systems

Information Processing and Management: an International Journal
Learning to rank collections

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Probability-based fusion of information retrieval result sets

Artificial Intelligence Review
Metadata harvesting for content-based distributed information retrieval

Journal of the American Society for Information Science and Technology
Inter and intra-document contexts applied in polyrepresentation for best match IR

Information Processing and Management: an International Journal
Introduction to Bayesian Statistics

Introduction to Bayesian Statistics

ISDM at imageCLEF 2010 fusion task

ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
The linear combination data fusion method in information retrieval

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Linear combination of component results in information retrieval

Data & Knowledge Engineering
Aggregation of multiple judgments for evaluating ordered lists

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval
Granularity of weighted averages and use of rate statistics in AggPro

Proceedings of the Winter Simulation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

"Data fusion" refers to the problem in information retrieval (IR) where several lists of documents ranked against a query are to be merged into a single ranked list for presentation to a user. Data fusion is also known as "metasearch." In a digital library setting data fusion may support operations such as federated search based on multiple repository representations. This paper presents a novel approach to the fusion problem: generative model-based Metasearch (GeM). We suggest viewing the appearance of documents in a return set as the outcome of a probabilistic process; some documents are likely to occur in the model, while others are unlikely. Using Bayesian parameter estimation to fit a multinomial distribution based on the return sets to be merged, GeM achieves a final ranking by listing documents in decreasing probability of generation under the induced model. We also introduce what we call "the impatient reader" approach to normalizing document ranks in service to the fusion operation. We report results from several experiments on TREC data suggesting that GeM, informed with impatient reader document scores, operates at state-of-the-art levels of effectiveness.