Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Journal of Algorithms
Joining nested relations and subrelations
Information Systems
On the complexity of join predicates
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On supporting containment queries in relational database management systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Algorithms on Trees and Graphs
Algorithms on Trees and Graphs
Set Containment Joins: The Good, The Bad and The Ugly
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Adaptive algorithms for set containment joins
ACM Transactions on Database Systems (TODS)
Efficient processing of joins on set-valued attributes
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Processing frequent itemset discovery queries by division and set containment join operators
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
A performance study of four index structures for set-valued attributes of low cardinality
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
On the integration of structure indexes and inverted lists
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Optimizing cursor movement in holistic twig joins
Proceedings of the 14th ACM international conference on Information and knowledge management
A combination of trie-trees and inverted files for the indexing of set-valued attributes
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
On the complexity of division and set joins in the relational algebra
Journal of Computer and System Sciences
SQL query optimization through nested relational algebra
ACM Transactions on Database Systems (TODS)
Efficiently Querying Large XML Data Repositories: A Survey
IEEE Transactions on Knowledge and Data Engineering
STXXL: standard template library for XXL data sets
Software—Practice & Experience
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Approximate Joins for Data-Centric XML
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient algorithms for descendant-only tree pattern queries
Information Systems
Towards a theory of search queries
ACM Transactions on Database Systems (TODS)
Efficient set intersection for inverted indexing
ACM Transactions on Information Systems (TOIS)
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
Set similarity join on probabilistic data
Proceedings of the VLDB Endowment
Efficient answering of set containment queries for skewed item distributions
Proceedings of the 14th International Conference on Extending Database Technology
Foundations of Semantic Web databases
Journal of Computer and System Sciences
Efficient processing of probabilistic set-containment queries on uncertain set-valued data
Information Sciences: an International Journal
Faster bit-parallel algorithms for unordered pseudo-tree matching and tree homeomorphism
Journal of Discrete Algorithms
Measuring structural similarity of semistructured data based on information-theoretic approaches
The VLDB Journal — The International Journal on Very Large Data Bases
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
We study the problem of computing containment queries on sets which can have both atomic and set-valued objects as elements, i.e., nested sets. Containment is a fundamental query pattern with many basic applications. Our study of nested set containment is motivated by the ubiquity of nested data in practice, e.g., in XML and JSON data management, in business and scientific workflow management, and in web analytics. Furthermore, there are to our knowledge no known efficient solutions to computing containment queries on massive collections of nested sets. Our specific contributions in this paper are: (1) we introduce two novel algorithms for efficient evaluation of containment queries on massive collections of nested sets; (2) we study caching and filtering mechanisms to accelerate query processing in the algorithms; (3) we develop extensions to the algorithms to a) compute several related query types and b) accommodate natural variations of the semantics of containment; and, (4) we present analytic and empirical analyses which demonstrate that both algorithms are efficient and scalable.