Experiments on Adaptive Set Intersections for Text Retrieval Systems

  • Authors:
  • Erik D. Demaine;Alejandro López-Ortiz;J. Ian Munro

  • Affiliations:
  • -;-;-

  • Venue:
  • ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In [3] we introduced an adaptive algorithm for computing the intersection of k sorted sets within a factor of at most 8k comparisons of the information-theoretic lower bound under a model that deals with an encoding of the shortest proof of the answer. This adaptive algorithm performs better for "burstier" inputs than a straightforward worst-case optimal method. Indeed, we have shown that, subject to a reasonable measure of instance difficulty, the algorithm adapts optimally up to a constant factor. This paper explores how this algorithm behaves under actual data distributions, compared with standard algorithms. We present experiments for searching 114 megabytes of text from the World Wide Web using 5,000 actual user queries from a commercial search engine. From the experiments, it is observed that the theoretically optimal adaptive algorithm is not always the optimal in practice, given the distribution of WWW text data. We then proceed to study several improvement techniques for the standard algorithms. These techniques combine improvements suggested by the observed distribution of the data as well as the theoretical results from [3]. We perform controlled experiments on these techniques to determine which ones result in improved performance, resulting in an algorithm that outperforms existing algorithms in most cases.