Adaptive algorithms for set containment joins

Authors:
Sergey Melnik;Hector Garcia-Molina
Affiliations:
Stanford University, Stanford, CA;Stanford University, Stanford, CA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2003

Citing 10
Cited 15

Evaluation of signature files as set access facilities in OODBs

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Partition based spatial-merge join

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
On the analysis of indexing schemes

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
WebBase: a repository of Web pages

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Signature files: an access method for documents and its analytical performance evaluation

ACM Transactions on Information Systems (TOIS)
On the complexity of join predicates

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Divide-and-Conquer Algorithm for Computing Set Containment Joins

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Set Containment Joins: The Good, The Bad and The Ugly

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Evaluation of Main Memory Join Algorithms for Joins with Set Comparison Join Predicates

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

Processing frequent itemset discovery queries by division and set containment join operators

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Efficient exact set-similarity joins

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A combination of trie-trees and inverted files for the indexing of set-valued attributes

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
SQL query optimization through nested relational algebra

ACM Transactions on Database Systems (TODS)
Ed-Join: an efficient algorithm for similarity joins with edit distance constraints

Proceedings of the VLDB Endowment
Subsumption and complementation as data fusion operators

Proceedings of the 13th International Conference on Extending Database Technology
On indexing error-tolerant set containment

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient answering of set containment queries for skewed item distributions

Proceedings of the 14th International Conference on Extending Database Technology
A constraint-based tool for data integrity management on the web

Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
Using prefix-trees for efficiently computing set joins

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Efficient main-memory algorithms for set containment join using inverted lists

ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems
Efficient processing of probabilistic set-containment queries on uncertain set-valued data

Information Sciences: an International Journal
Filtering and ranking schemes for finding inclusion dependencies on the web

Proceedings of the 21st international conference companion on World Wide Web
Efficient processing of containment queries on nested sets

Proceedings of the 16th International Conference on Extending Database Technology
Efficient filtering and ranking schemes for finding inclusion dependencies on the web

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A set containment join is a join between set-valued attributes of two relations, whose join condition is specified using the subset (⊆) operator. Set containment joins are deployed in many database applications, even those that do not support set-valued attributes. In this article, we propose two novel partitioning algorithms, called the Adaptive Pick-and-Sweep Join (APSJ) and the Adaptive Divide-and-Conquer Join (ADCJ), which allow computing set containment joins efficiently. We show that APSJ outperforms previously suggested algorithms for many data sets, often by an order of magnitude. We present a detailed analysis of the algorithms and study their performance on real and synthetic data using an implemented testbed.