Fast joins using join indices

Authors:
Zhe Li;Kenneth A. Ross
Affiliations:
Department of Computer Science, Columbia University, New York, NY 10027/ e-mail: li,kar@cs.columbia.edu;Department of Computer Science, Columbia University, New York, NY 10027/ e-mail: li,kar@cs.columbia.edu
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
1999

Citing 30
Cited 13

Design and implementation of the Wisconsin storage system

Software—Practice & Experience
Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
Join indices

ACM Transactions on Database Systems (TODS)
Single table access using multiple indexes: optimization, execution, and concurrency control techniques

EDBT '90 Proceedings of the 2nd international conference on extending database technology: Advances in Database Technology
A performance evaluation of pointer-based joins

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Access support in object bases

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Join processing in relational databases

ACM Computing Surveys (CSUR)
Quest: a project on database mining

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Multi-table joins through bitmapped join indices

ACM SIGMOD Record
Why decision support fails and how to fix it

ACM SIGMOD Record
Keynote address: access to data in NASA's Earth observing system

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Faster joins, self-joins and multi-way joins using join indices

Data & Knowledge Engineering - Special issue: next generation information technologies and systems
On searching transposed files

ACM Transactions on Database Systems (TODS)
Operating system support for database management

Communications of the ACM
Approximating block accesses in database organizations

Communications of the ACM
Query processing for decision support: the SQLmpp solution

PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
Database Systems Concepts

Database Systems Concepts
A new way to compute the product and join of relations

SIGMOD '80 Proceedings of the 1980 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Sort vs. Hash Revisited

IEEE Transactions on Knowledge and Data Engineering
Multiprocessor Join Scheduling

IEEE Transactions on Knowledge and Data Engineering
Efficiently Following Object References for Large Object Collections and Small Main Memory

DOOD '95 Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases
Join Index, Materialized View, and Hybrid-Hash Join: A Performance Analysis

Proceedings of the Sixth International Conference on Data Engineering
An Efficient Hybrid Join Algorithm: A DB2 Prototype

Proceedings of the Seventh International Conference on Data Engineering
Distance-Associated Join Indices for Spatial Range Search

Proceedings of the Eighth International Conference on Data Engineering
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
Join Index Hierarchies for Supporting Efficient Navigations in Object-Oriented Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Sing the truth about ad hoc join costs

The VLDB Journal — The International Journal on Very Large Data Bases
The optimization of queries in relational databases

The optimization of queries in relational databases

Functional-join processing

The VLDB Journal — The International Journal on Very Large Data Bases
Exploiting early sorting and early partitioning for decision support query processing

The VLDB Journal — The International Journal on Very Large Data Bases
GhostDB: querying visible and hidden data without leaks

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Cache-conscious radix-decluster projections

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Fast scans and joins using flash drives

Proceedings of the 4th international workshop on Data management on new hardware
Revelation on demand

Distributed and Parallel Databases
Query processing techniques for solid state drives

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Time-HOBI: indexing dimension hierarchies by means of hierarchically organized bitmaps

DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
Secure personal data servers: a vision paper

Proceedings of the VLDB Endowment
Data mining techniques in materialised project and selection view

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Time-HOBI: Index for optimizing star queries

Information Systems
MILo-DB: a personal, secure and portable database machine

Distributed and Parallel Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two new algorithms, “Jive join” and “Slam join,” are proposed for computing the join of two relations using a join index. The algorithms are duals: Jive join range-partitions input relation tuple ids and then processes each partition, while Slam join forms ordered runs of input relation tuple ids and then merges the results. Both algorithms make a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a temporary file, whose size is half that of the join index. Both algorithms require only that the number of blocks in main memory is of the order of the square root of the number of blocks in the smaller relation. By storing intermediate and final join results in a vertically partitioned fashion, our algorithms need to manipulate less data in memory at a given time than other algorithms. The algorithms are resistant to data skew and adaptive to memory fluctuations. Selection conditions can be incorporated into the algorithms. Using a detailed cost model, the algorithms are analyzed and compared with competing algorithms. For large input relations, our algorithms perform significantly better than Valduriez's algorithm, the TID join algorithm, and hash join algorithms. An experimental study is also conducted to validate the analytical results and to demonstrate the performance characteristics of each algorithm in practice.