Hybrid results merging

Authors:
Georgios Paltoglou;Michail Salampasis;Maria Satratzemi
Affiliations:
University of Macedonia, Thessaloniki, Greece;Alexander Educational Technological Institute of Thessaloniki, Thessaloniki, Greece;University of Macedonia, Thessaloniki, Greece
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 14
Cited 6

Searching distributed collections with inference networks

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Real life, real users, and real needs: a study and analysis of user queries on the web

Information Processing and Management: an International Journal
The impact of database selection on distributed searching

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Database merging strategy based on logistic regression

Information Processing and Management: an International Journal
Query-based sampling of text databases

ACM Transactions on Information Systems (TOIS)
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Relevant document distribution estimation method for resource selection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
STARTS: Stanford Protocol Proposal for Internet Retrieval and Search

STARTS: Stanford Protocol Proposal for Internet Retrieval and Search
A semisupervised learning method to merge search engine results

ACM Transactions on Information Systems (TOIS)
Engineering a multi-purpose test collection for web retrieval experiments

Information Processing and Management: an International Journal
The FedLemur project: Federated search in the real world

Journal of the American Society for Information Science and Technology
From uncertain inference to probability of relevance for advanced IR applications

ECIR'03 Proceedings of the 25th European conference on IR research
Results merging algorithm using multiple regression models

ECIR'07 Proceedings of the 29th European conference on IR research

Integral based source selection for uncooperative distributed information retrieval environments

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Simple Adaptations of Data Fusion Algorithms for Source Selection

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
AnchorWoman: top-k structured mobile web search engine

Proceedings of the 18th ACM conference on Information and knowledge management
Collection-integral source selection for uncooperative distributed information retrieval environments

Information Sciences: an International Journal
Research proposal for distributed deep web search

PIKM '10 Proceedings of the 3rd workshop on Ph.D. students in information and knowledge management
Modeling information sources as integrals for effective and efficient source selection

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of results merging in distributed information retrieval environments has been approached by two different directions in research. Estimation approaches attempt to calculate the relevance of the returned documents through ad-hoc methodologies (weighted score merging, regression etc) while download approaches, download all the documents locally, partially or completely, in order to estimate "first hand" their relevance. Both have their advantages and disadvantages. It is assumed that download algorithms are more effective but they are very expensive in terms of time and bandwidth. Estimation approaches on the other hand, usually rely on document relevance scores being returned by the remote collections in order to achieve maximum performance. In addition to that, regression algorithms, which have proved to be more effective than weighted scores merging, rely on a significant number of overlap documents in order to function effectively, practically requiring multiple interactions with the remote collections. The new algorithm that is introduced reconciles the above two approaches, combining their strengths, while minimizing their weaknesses. It is based on downloading a limited, selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies. The proposed algorithm is tested in a variety of settings and its performance is found to be better than estimation approaches, while approximating that of download.