Parallel evaluation of conjunctive queries

Authors:
Paraschos Koutris;Dan Suciu
Affiliations:
University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA
Venue:
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2011

Citing 20
Cited 10

Expressibility and parallel complexity

SIAM Journal on Computing
A bridging model for parallel computation

Communications of the ACM
Parallel database systems: the future of high performance database systems

Communications of the ACM
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
"Balls into Bins" - A Simple and Tight Analysis

RANDOM '98 Proceedings of the Second International Workshop on Randomization and Approximation Techniques in Computer Science
Elements Of Finite Model Theory (Texts in Theoretical Computer Science. An Eatcs Series)

Elements Of Finite Model Theory (Texts in Theoretical Computer Science. An Eatcs Series)
Containment of aggregate queries

ACM SIGMOD Record
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Handling data skew in parallel joins in shared-nothing systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Uniform Hashing in Constant Time and Optimal Space

SIAM Journal on Computing
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
Database Query Processing Using Finite Cursor Machines

Theory of Computing Systems
Building a high-level dataflow system on top of Map-Reduce: the Pig experience

Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment

Proceedings of the 13th International Conference on Extending Database Technology
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
The declarative imperative: experiences and conjectures in distributed logic

ACM SIGMOD Record
A model of computation for MapReduce

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment

SkewTune: mitigating skew in mapreduce applications

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Win-move is coordination-free (sometimes)

Proceedings of the 15th International Conference on Database Theory
Parallel skyline queries

Proceedings of the 15th International Conference on Database Theory
Factorised representations of query results: size bounds and readability

Proceedings of the 15th International Conference on Database Theory
Minimal MapReduce algorithms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Communication steps for parallel query processing

Proceedings of the 32nd symposium on Principles of database systems
Upper and lower bounds on the cost of a map-reduce computation

Proceedings of the VLDB Endowment
Distributed data management using MapReduce

ACM Computing Surveys (CSUR)
Querying big social data

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Making queries tractable on big data with preprocessing: through the eyes of complexity theory

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The availability of large data centers with tens of thousands of servers has led to the popular adoption of massive parallelism for data analysis on large datasets. Several query languages exist for running queries on massively parallel architectures, some based on the MapReduce infrastructure, others using proprietary implementations. Motivated by this trend, this paper analyzes the parallel complexity of conjunctive queries. We propose a very simple model of parallel computation that captures these architectures, in which the complexity parameter is the number of parallel steps requiring synchronization of all servers. We study the complexity of conjunctive queries and give a complete characterization of the queries which can be computed in one parallel step. These form a strict subset of hierarchical queries, and include flat queries like R(x,y), S(x,z), T(x,v), U(x,w), tall queries like R(x), S(x,y), T(x,y,z), U(x,y,z,w), and combinations thereof, which we call tall-flat queries. We describe an algorithm for computing in parallel any tall-flat query, and prove that any query that is not tall-flat cannot be computed in one step in this model. Finally, we present extensions of our results to queries that are not tall-flat.