The parallel complexity of simple chain queries
PODS '87 Proceedings of the sixth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A study of transitive closure as a recursion mechanism
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
High-probability parallel transitive closure algorithms
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
On the equivalence of recursive and nonrecursive datalog programs
PODS '92 Proceedings of the eleventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Performance evaluation of algorithms for transitive closure
Information Systems
Bonded arity Datalog (≠) queries on graphs
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On Datalog vs. polynomial time
Journal of Computer and System Sciences
Inherent complexity of recursive queries
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies
Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies
On the Computation of the Transitive Closure of Relational Operators
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Reachability and Distance Queries via 2-Hop Labels
SIAM Journal on Computing
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Dual Labeling: Answering Graph Reachability Queries in Constant Time
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Parallel complexity of logical query programs
SFCS '86 Proceedings of the 27th Annual Symposium on Foundations of Computer Science
Clustera: an integrated computation and data management system
Proceedings of the VLDB Endowment
Evaluating Reachability Queries over Path Collections
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Boom analytics: exploring data-centric, declarative programming for the cloud
Proceedings of the 5th European conference on Computer systems
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
GRAIL: scalable reachability index for large graphs
Proceedings of the VLDB Endowment
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Path-tree: An efficient reachability indexing scheme for large directed graphs
ACM Transactions on Database Systems (TODS)
Map-reduce extensions and recursive queries
Proceedings of the 14th International Conference on Extending Database Technology
Hyracks: A flexible and extensible foundation for data-intensive computing
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Fast computation of reachability labeling for large graphs
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Mining of Massive Datasets
Designing good algorithms for MapReduce and beyond
Proceedings of the Third ACM Symposium on Cloud Computing
Making queries tractable on big data with preprocessing: through the eyes of complexity theory
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Implementing recursive algorithms on computing clusters presents a number of new challenges. In particular, we consider the endgame problem: later rounds of a recursion often transfer only small amounts of data, causing high overhead for interprocessor communication. One way to deal with the endgame problem is to use an algorithm that reduces the number of rounds of the recursion. Especially, in an application like transitive closure ("TC") there are several recursive-doubling algorithms that use a logarithmic, rather than linear, number of rounds. Unfortunately, recursive-doubling algorithms can deduce many more facts than the linear TC algorithms, which could negate the cost savings from the elimination of the overhead due to the proliferation of small files. We are thus led to consider TC algorithms that, like the linear algorithms, have the unique decomposition property that assures paths are discovered only once. We find that many such algorithms exist, and we show that they are incomparable, in that any of them could prove best on some data --- even lower in cost than the linear algorithms in some cases. The recursive-doubling approach to TC extends to other recursions as well. However, it is not acceptable to reduce the number of rounds at the expense of a major increase in the number of facts that are deduced. In this paper, we prove it is possible to implement any Datalog program of right-linear chain rules with a logarithmic number of rounds and no order-of-magnitude increase in the number of facts deduced. On the other hand, there are linear recursions for which the two goals of reducing the number of rounds and maintaining the total number of deduced facts cannot be met simultaneously. We show that the reachability problem cannot be solved in logarithmic rounds without using a binary predicate, thus squaring the number of potential facts to be deduced. We also show that the samegeneration recursion cannot be solved in logarithmic rounds without using a predicate of arity three.