Shaping SQL-Based frequent pattern mining algorithms

Authors:
Csaba István Sidló;András Lukács
Affiliations:
Faculty of Informatics, Eötvös Loránd University, Budapest, Hungary;Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary
Venue:
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Year:
2005

Citing 21
Cited 1

Set-oriented data mining in relational databases

Data & Knowledge Engineering
A database perspective on knowledge discovery

Communications of the ACM
Towards on-line analytical mining in large databases

ACM SIGMOD Record
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SQL database primitives for decision tree classifiers

Proceedings of the tenth international conference on Information and knowledge management
A Tightly-Coupled Architecture for Data Mining

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Integrating Data Mining with SQL Databases: OLE DB for Data Mining

Proceedings of the 17th International Conference on Data Engineering
H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Performance Evaluation and Optimization of Join Queries for Association Rule Mining

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Modeling KDD Processes within the Inductive Database Framework

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
SQL Based Association Rule Mining Using Commercial RDBMS (IBM DB2 UBD EEE)

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Decision Tree Modeling with Relational Views

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Top Down FP-Growth for Association Rule Mining

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Processing frequent itemset discovery queries by division and set containment join operators

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Mining Frequent Itemsets from Secondary Memory

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Index Support for Frequent Itemset Mining in a Relational DBMS

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Computing frequent itemsets inside oracle 10G

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
SQL based frequent pattern mining with FP-Growth

INAP'04/WLP'04 Proceedings of the 15th international conference on Applications of Declarative Programming and Knowledge Management, and 18th international conference on Workshop on Logic Programming

Two-phase data warehouse optimized for data mining

BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises

Quantified Score

Hi-index	0.00

Visualization

Abstract

Integration of data mining and database management systems could significantly ease the process of knowledge discovery in large databases. We consider implementations of frequent itemset mining algorithms, in particular pattern-growth algorithms similar to the top-down FP-growth variations, tightly coupled to relational database management systems. Our implementations remain within the confines of the conventional relational database facilities like tables, indices, and SQL operations. We compare our algorithm to the most promising previously proposed SQL-based FIM algorithm. Experiments show that our method performs better in many cases, but still has severe limitations compared to the traditional stand-alone pattern-growth method implementations. We identify the bottlenecks of our SQL-based pattern-growth methods and investigate the applicability of tightly coupled algorithms in practice.