Communication steps for parallel query processing

  • Authors:
  • Paul Beame;Paraschos Koutris;Dan Suciu

  • Affiliations:
  • University of Washington, Seattle, USA;University of Washington, Seattle, USA;University of Washington, Seattle, USA

  • Venue:
  • Proceedings of the 32nd symposium on Principles of database systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of computing a relational query q on a large input database of size n, using a large number p of servers. The computation is performed in rounds, and each server can receive only O(n/p1-ε) bits of data, where ε ∈[0,1] is a parameter that controls replication. We examine how many global communication steps are needed to compute q. We establish both lower and upper bounds, in two settings. For a single round of communication, we give lower bounds in the strongest possible model, where arbitrary bits may be exchanged; we show that any algorithm requires ε ≥ 1--1/τ*, where τ* is the fractional vertex cover of the hypergraph of q. We also give an algorithm that matches the lower bound for a specific class of databases. For multiple rounds of communication, we present lower bounds in a model where routing decisions for a tuple are tuple-based. We show that for the class of tree-like queries there exists a tradeoff between the number of rounds and the space exponent ε. The lower bounds for multiple rounds are the first of their kind. Our results also imply that transitive closure cannot be computed in O(1) rounds of communication.