A framework to support multiple query optimization for complex mining tasks

  • Authors:
  • Ruoming Jin;Kaushik Sinha;Gagan Agrawal

  • Affiliations:
  • Ohio State University, Columbus, OH;Ohio State University, Columbus, OH;Ohio State University, Columbus, OH

  • Venue:
  • MDM '05 Proceedings of the 6th international workshop on Multimedia data mining: mining integrated media and complex data
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

With an increasing use of data mining tools and techniques, we envision that a Knowledge Discovery and Data Mining System (KDDMS) will have to support and optimize for the following scenarios: 1) Sequence of Queries: A user may analyze one or more datasets by issuing a sequence of related complex mining queries, and 2) Multiple Simultaneous Queries: Several users may be analyzing a set of datasets concurrently, and may issue related complex queries.This paper presents a systematic mechanism to optimize for the above cases, targetting the class of mining queries involving frequent pattern mining on one or multiple datasets. We present a system architecture and propose new algorithms for this purpose. We show the design of a knowledgeable cache which can store the past query results from queries on multiple datasets. We present algorithms which enable the use of the results stored in such a cache to further optimize multiple queries.We have implemented and evaluated our system with both real and synthetic datasets. Our experimental results show that our techniques can achieve a speedup of up to a factor of 9, compared with the systems which do not support caching or optimize for multiple queries.