Frequent itemset mining with parallel RDBMS

Authors:
Xuequn Shang;Kai-Uwe Sattler
Affiliations:
Department of Computer Science, University of Magdeburg, Magdeburg, Germany;Department of Computer Science and Automation, Technical University of Ilmenau
Venue:
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2005

Citing 6
Cited 0

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A New SQL-like Operator for Mining Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Depth-first frequent itemset mining in relational databases

Proceedings of the 2005 ACM symposium on Applied computing
Parallel FP-growth on PC cluster

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining

Quantified Score

Hi-index	0.02

Visualization

Abstract

Data mining on large relational databases has gained popularity and its significance is well recognized. However, the performance of SQL based data mining is known to fall behind specialized implementation. We investigate approaches based on SQL for the problem of finding frequent patterns from a transaction table, including an algorithm that we recently proposed, called Ppropad (Parallel PROjection PAttern Discovery). Ppropad successively projects the transaction table into frequent itemsets to avoid making multiple passes over the large original transaction table and generating a huge sets of candidates. We have built a parallel database system with DB2 and made performance evaluation on it. We prove that data mining with SQL can achieve sufficient performance by the utilization of database tuning.