Matching bounds for the all-pairs MapReduce problem

  • Authors:
  • Foto Afrati;Jeffrey Ullman

  • Affiliations:
  • National Technical University of Athens;Stanford University

  • Venue:
  • Proceedings of the 17th International Database Engineering & Applications Symposium
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The all-pairs problem is an input-output relationship where each output corresponds to a pair of inputs, and each pair of inputs has a corresponding output. It models similarity joins where no simplification of the search for similar pairs, e.g., locality-sensitive hashing, is possible, and each input must be compared with every other input to determine those pairs that are "similar." When implemented by a MapReduce algorithm, there was a gap, a factor of 2, between the lower bound on necessary communication and the communication required by the best known algorithm. In this brief paper we show that the lower bound can essentially be met.