Introduction to algorithms
Background for association rules and cost estimate of selected mining algorithms
CIKM '96 Proceedings of the fifth international conference on Information and knowledge management
An overview of query optimization in relational systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Query flocks: a generalization of association-rule mining
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Integrating association rule mining with relational database systems: alternatives and implications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of constrained frequent set queries with 2-variable constraints
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
MSQL: A Query Language for Database Mining
Data Mining and Knowledge Discovery
Detecting Group Differences: Mining Contrast Sets
Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A New SQL-like Operator for Mining Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
DualMiner: a dual-pruning algorithm for itemsets with constraints
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Querying multiple sets of discovered rules
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable Classification over SQL Databases
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
On detecting differences between groups
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards NIC-based intrusion detection
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-relational data mining: an introduction
ACM SIGKDD Explorations Newsletter
Statistical properties of transactional databases
Proceedings of the 2004 ACM symposium on Applied computing
CrossMine: Efficient Classification Across Multiple Database Relations
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Simultaneous optimization of complex mining tasks with a knowledgeable cache
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
New techniques for efficiently discovering frequent patterns
New techniques for efficiently discovering frequent patterns
Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Computing frequent itemsets inside oracle 10G
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Hi-index | 0.00 |
For complex data mining queries, query optimization issues arise, similar to those for the traditional database queries. However, few works have applied the cost-based query optimization, which is the key technique in optimizing traditional database queries, on complex mining queries. In this work, we develop a cost-based query optimization framework to an important collection of data mining queries, i.e. frequent pattern mining across multiple databases. Specifically, we make the following contributions: 1) We present a rich class of queries on mining frequent itemsets across multiple datasets supported by a SQL-based mechanism. 2) We present an approach to enumerate all possible query plans for the mining queries, and develop a dynamic programming approach and a branch-and-bound approach based on the enumeration algorithm to find optimal query plans with the least mining cost. 3) We introduce models to estimate the cost of individual mining operators. 4) We evaluate our query optimization techniques on both real and synthetic datasets and show significant performance improvements.