Expressibility and parallel complexity
SIAM Journal on Computing
A bridging model for parallel computation
Communications of the ACM
Parallel database systems: the future of high performance database systems
Communications of the ACM
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
"Balls into Bins" - A Simple and Tight Analysis
RANDOM '98 Proceedings of the Second International Workshop on Randomization and Approximation Techniques in Computer Science
Elements Of Finite Model Theory (Texts in Theoretical Computer Science. An Eatcs Series)
Elements Of Finite Model Theory (Texts in Theoretical Computer Science. An Eatcs Series)
Containment of aggregate queries
ACM SIGMOD Record
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient query evaluation on probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Handling data skew in parallel joins in shared-nothing systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Uniform Hashing in Constant Time and Optimal Space
SIAM Journal on Computing
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Database Query Processing Using Finite Cursor Machines
Theory of Computing Systems
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
A model of computation for MapReduce
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Win-move is coordination-free (sometimes)
Proceedings of the 15th International Conference on Database Theory
Proceedings of the 15th International Conference on Database Theory
Factorised representations of query results: size bounds and readability
Proceedings of the 15th International Conference on Database Theory
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Communication steps for parallel query processing
Proceedings of the 32nd symposium on Principles of database systems
Upper and lower bounds on the cost of a map-reduce computation
Proceedings of the VLDB Endowment
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Making queries tractable on big data with preprocessing: through the eyes of complexity theory
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The availability of large data centers with tens of thousands of servers has led to the popular adoption of massive parallelism for data analysis on large datasets. Several query languages exist for running queries on massively parallel architectures, some based on the MapReduce infrastructure, others using proprietary implementations. Motivated by this trend, this paper analyzes the parallel complexity of conjunctive queries. We propose a very simple model of parallel computation that captures these architectures, in which the complexity parameter is the number of parallel steps requiring synchronization of all servers. We study the complexity of conjunctive queries and give a complete characterization of the queries which can be computed in one parallel step. These form a strict subset of hierarchical queries, and include flat queries like R(x,y), S(x,z), T(x,v), U(x,w), tall queries like R(x), S(x,y), T(x,y,z), U(x,y,z,w), and combinations thereof, which we call tall-flat queries. We describe an algorithm for computing in parallel any tall-flat query, and prove that any query that is not tall-flat cannot be computed in one step in this model. Finally, we present extensions of our results to queries that are not tall-flat.