A framework to support multiple query optimization for complex mining tasks

Authors:
Ruoming Jin;Kaushik Sinha;Gagan Agrawal
Affiliations:
Ohio State University, Columbus, OH;Ohio State University, Columbus, OH;Ohio State University, Columbus, OH
Venue:
MDM '05 Proceedings of the 6th international workshop on Multimedia data mining: mining integrated media and complex data
Year:
2005

Citing 29
Cited 3

Multiple-query optimization

ACM Transactions on Database Systems (TODS)
Improvements on a heuristic algorithm for multiple-query optimization

Data & Knowledge Engineering
Query flocks: a generalization of association-rule mining

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Simultaneous optimization and evaluation of multiple dimensional queries

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of constrained frequent set queries with 2-variable constraints

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using a knowledge cache for interactive discovery of association rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient and extensible algorithms for multi query optimization

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Molecular feature mining in HIV data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Is pushing constraints deeply into the mining algorithms really what we want?: an alternative approach for association rule mining

ACM SIGKDD Explorations Newsletter
Discovery in multi-attribute data with user-defined constraints

ACM SIGKDD Explorations Newsletter
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
Using Common Subexpressions to Optimize Multiple Queries

Proceedings of the Fourth International Conference on Data Engineering
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Mining Frequent Item Sets with Convertible Constraints

Proceedings of the 17th International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
DualMiner: a dual-pruning algorithm for itemsets with constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficiently mining frequent trees in a forest

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A Theory of Inductive Query Answering

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Multiple Query Optimization for Data Analysis Applications on Clusters of SMPs

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
An Algebra for Inductive Query Evaluation

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On detecting differences between groups

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-relational data mining: an introduction

ACM SIGKDD Explorations Newsletter
Scalability and efficiency in multi-relational data mining

ACM SIGKDD Explorations Newsletter
State of the art of graph-based data mining

ACM SIGKDD Explorations Newsletter
CrossMine: Efficient Classification Across Multiple Database Relations

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Mining Closed Relational Graphs with Connectivity Constraints

ICDE '05 Proceedings of the 21st International Conference on Data Engineering

Multiple-Objective Compression of Data Cubes in Cooperative OLAP Environments

ADBIS '08 Proceedings of the 12th East European conference on Advances in Databases and Information Systems
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems
Top-down compression of data cubes in the presence of simultaneous multiple hierarchical range queries

ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With an increasing use of data mining tools and techniques, we envision that a Knowledge Discovery and Data Mining System (KDDMS) will have to support and optimize for the following scenarios: 1) Sequence of Queries: A user may analyze one or more datasets by issuing a sequence of related complex mining queries, and 2) Multiple Simultaneous Queries: Several users may be analyzing a set of datasets concurrently, and may issue related complex queries.This paper presents a systematic mechanism to optimize for the above cases, targetting the class of mining queries involving frequent pattern mining on one or multiple datasets. We present a system architecture and propose new algorithms for this purpose. We show the design of a knowledgeable cache which can store the past query results from queries on multiple datasets. We present algorithms which enable the use of the results stored in such a cache to further optimize multiple queries.We have implemented and evaluated our system with both real and synthetic datasets. Our experimental results show that our techniques can achieve a speedup of up to a factor of 9, compared with the systems which do not support caching or optimize for multiple queries.