Cost-based query optimization for complex pattern mining on multiple databases

Authors:
Ruoming Jin;Dave Fuhry;Abdulkareem Alali
Affiliations:
Kent State University, Kent, OH;Kent State University, Kent, OH;Kent State University, Kent, OH
Venue:
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Year:
2008

Citing 25
Cited 0

Introduction to algorithms

Introduction to algorithms
Background for association rules and cost estimate of selected mining algorithms

CIKM '96 Proceedings of the fifth international conference on Information and knowledge management
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Query flocks: a generalization of association-rule mining

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of constrained frequent set queries with 2-variable constraints

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
MSQL: A Query Language for Database Mining

Data Mining and Knowledge Discovery
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A New SQL-like Operator for Mining Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
DualMiner: a dual-pruning algorithm for itemsets with constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Querying multiple sets of discovered rules

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable Classification over SQL Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
On detecting differences between groups

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards NIC-based intrusion detection

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-relational data mining: an introduction

ACM SIGKDD Explorations Newsletter
Statistical properties of transactional databases

Proceedings of the 2004 ACM symposium on Applied computing
CrossMine: Efficient Classification Across Multiple Database Relations

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Simultaneous optimization of complex mining tasks with a knowledgeable cache

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
New techniques for efficiently discovering frequent patterns

New techniques for efficiently discovering frequent patterns
Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Computing frequent itemsets inside oracle 10G

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Quantified Score

Hi-index	0.00

Visualization

Abstract

For complex data mining queries, query optimization issues arise, similar to those for the traditional database queries. However, few works have applied the cost-based query optimization, which is the key technique in optimizing traditional database queries, on complex mining queries. In this work, we develop a cost-based query optimization framework to an important collection of data mining queries, i.e. frequent pattern mining across multiple databases. Specifically, we make the following contributions: 1) We present a rich class of queries on mining frequent itemsets across multiple datasets supported by a SQL-based mechanism. 2) We present an approach to enumerate all possible query plans for the mining queries, and develop a dynamic programming approach and a branch-and-bound approach based on the enumeration algorithm to find optimal query plans with the least mining cost. 3) We introduce models to estimate the cost of individual mining operators. 4) We evaluate our query optimization techniques on both real and synthetic datasets and show significant performance improvements.