Hybrid results merging

  • Authors:
  • Georgios Paltoglou;Michail Salampasis;Maria Satratzemi

  • Affiliations:
  • University of Macedonia, Thessaloniki, Greece;Alexander Educational Technological Institute of Thessaloniki, Thessaloniki, Greece;University of Macedonia, Thessaloniki, Greece

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of results merging in distributed information retrieval environments has been approached by two different directions in research. Estimation approaches attempt to calculate the relevance of the returned documents through ad-hoc methodologies (weighted score merging, regression etc) while download approaches, download all the documents locally, partially or completely, in order to estimate "first hand" their relevance. Both have their advantages and disadvantages. It is assumed that download algorithms are more effective but they are very expensive in terms of time and bandwidth. Estimation approaches on the other hand, usually rely on document relevance scores being returned by the remote collections in order to achieve maximum performance. In addition to that, regression algorithms, which have proved to be more effective than weighted scores merging, rely on a significant number of overlap documents in order to function effectively, practically requiring multiple interactions with the remote collections. The new algorithm that is introduced reconciles the above two approaches, combining their strengths, while minimizing their weaknesses. It is based on downloading a limited, selected number of documents from the remote collections and estimating the relevance of the rest through regression methodologies. The proposed algorithm is tested in a variety of settings and its performance is found to be better than estimation approaches, while approximating that of download.