Scalable computation of acyclic joins

Authors:
Anna Pagh;Rasmus Pagh
Affiliations:
IT University of Copenhagen, København, Denmark;IT University of Copenhagen, København, Denmark
Venue:
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2006

Citing 13
Cited 2

On the optimal nesting order for computing N-relational joins

ACM Transactions on Database Systems (TODS)
Optimizing join queries in distributed databases

Proc. of the seventh conference on Foundations of software technology and theoretical computer science
The input/output complexity of sorting and related problems

Communications of the ACM
Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
On the Complexity of Testing Implications of Functional and Join Dependencies

Journal of the ACM (JACM)
A relational model of data for large shared data banks

Communications of the ACM
Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies

Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Computer Science Handbook, Second Edition

Computer Science Handbook, Second Edition
Readings in Database Systems: Fourth Edition

Readings in Database Systems: Fourth Edition
Subquadratic algorithms for 3SUM

WADS'05 Proceedings of the 9th international conference on Algorithms and Data Structures

Faster join-projects and sparse matrix multiplications

Proceedings of the 12th International Conference on Database Theory
Worst-case optimal join algorithms: [extended abstract]

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The join operation of relational algebra is a cornerstone of relational database systems. Computing the join of several relations is NP-hard in general, whereas special (and typical) cases are tractable. This paper considers joins having an acyclic join graph, for which current methods initially apply a full reducer to efficiently eliminate tuples that will not contribute to the result of the join. From a worst-case perspective, previous algorithms for computing an acyclic join of k fully reduced relations, occupying a total of n≥k blocks on disk, use Ω((n+z)k) I/Os, where z is the size of the join result in blocks.In this paper we show how to compute the join in a time bound that is within a constant factor of the cost of running a full reducer plus sorting the output. For a broad class of acyclic join graphs this is O(sort(n+z)) I/Os, removing the dependence on k from previous bounds. Traditional methods decompose the join into a number of binary joins, which are then carried out one by one. Departing from this approach, our technique is based on computing the size of certain subsets of the result, and using these sizes to compute the location(s) of each data item in the result.Finally, as an initial study of cyclic joins in the I/O model, we show how to compute a join whose join graph is a 3-cycle, in O(n2/m+sort(n+z)) I/Os, where m is the number of blocks in internal memory.