SQL based frequent pattern mining without candidate generation

Authors:
Xuequn Shang;Kai Uwe Sattler;Ingolf Geist
Affiliations:
University of Magdeburg, Magdeburg, Germany;University of Magdeburg, Magdeburg, Germany;University of Magdeburg, Magdeburg, Germany
Venue:
Proceedings of the 2004 ACM symposium on Applied computing
Year:
2004

Citing 12
Cited 3

Latex: a document preparation system

Latex: a document preparation system
Set-oriented data mining in relational databases

Data & Knowledge Engineering
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A tree projection algorithm for generation of frequent item sets

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
SQL database primitives for decision tree classifiers

Proceedings of the tenth international conference on Information and knowledge management
Using SQL to Build New Aggregates and Extenders for Object- Relational Systems

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A New SQL-like Operator for Mining Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Processing frequent itemset discovery queries by division and set containment join operators

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

Depth-first frequent itemset mining in relational databases

Proceedings of the 2005 ACM symposium on Applied computing
Fast UDFs to compute sufficient statistics on large data sets exploiting caching and sampling

Data & Knowledge Engineering
Programming relational databases for Itemset mining over large transactional tables

EPIA'05 Proceedings of the 12th Portuguese conference on Progress in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scalable data mining in large databases is one of today's real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on commercial DBMS (IBM DB2 UDB EEE V8).