Processing frequent itemset discovery queries by division and set containment join operators

Authors:
Ralf Rantzau
Affiliations:
University of Stuttgart, Universitätsstraße, Stuttgart, Germany
Venue:
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Year:
2003

Citing 16
Cited 9

Fast algorithms for universal quantification in large databases

ACM Transactions on Database Systems (TODS)
Set-oriented data mining in relational databases

Data & Knowledge Engineering
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Real world performance of association rule algorithms

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms and applications for universal quantification in relational databases

Information Systems - Special issue: Best papers from EDBT 2002
Divide-and-Conquer Algorithm for Computing Set Containment Joins

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
A Requirements Analysis for Parallel KDD Systems

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Set Containment Joins: The Good, The Bad and The Ugly

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Storage and Querying of E-Commerce Data

Proceedings of the 27th International Conference on Very Large Data Bases
XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Adaptive algorithms for set containment joins

ACM Transactions on Database Systems (TODS)
Efficient processing of joins on set-valued attributes

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient storage and query processing of set-valued attributes

Efficient storage and query processing of set-valued attributes

SQL based frequent pattern mining without candidate generation

Proceedings of the 2004 ACM symposium on Applied computing
Horizontal aggregations for building tabular data sets

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Depth-first frequent itemset mining in relational databases

Proceedings of the 2005 ACM symposium on Applied computing
Programming relational databases for Itemset mining over large transactional tables

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence
Logic-Based association rule mining in XML documents

APWeb'06 Proceedings of the 2006 international conference on Advanced Web and Network Technologies, and Applications
Using prefix-trees for efficiently computing set joins

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
SQL based frequent pattern mining with FP-Growth

INAP'04/WLP'04 Proceedings of the 15th international conference on Applications of Declarative Programming and Knowledge Management, and 18th international conference on Workshop on Logic Programming
Shaping SQL-Based frequent pattern mining algorithms

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Efficient processing of containment queries on nested sets

Proceedings of the 16th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

SQL-based data mining algorithms are rarely used in practice today. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. Nevertheless, database vendors try to integrate analysis functionalities to some extent into their query execution and optimization components in order to narrow the gap between data and processing. Such a database support is particularly important when data mining applicatons need to analyze very large datasets or when they need access current data, not a possibly outdated copy of it.We investigate approaches based on SQL for the problem of finding frequent itemsets in a transaction table, including an algorithm that we recently proposed, called Quiver, which employs universal and existential quantifications. This approach employs a table schema for itemsets that is similar to the commonly used vertical layout for transactions: each item of an itemset is stored in a separate row. We argue that expressing the frequent itemset discovery problem using quantifications offers interesting opportunities to process such queries using set containment join or set containment division operators, which are not yet available in commercial database systems. Initial performance experiments reveal that Quiver cannot be processed efficiently by commercial DBMS. However, our experiments with query execution plans that use operators realizing set containment tests suggest that an efficient processing of Quiver is possible.