Simultaneous optimization of complex mining tasks with a knowledgeable cache

  • Authors:
  • Ruoming Jin;Kaushik Sinha;Gagan Agrawal

  • Affiliations:
  • Ohio State University, Columbus OH;Ohio State University, Columbus OH;Ohio State University, Columbus OH

  • Venue:
  • Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

With an increasing use of data mining tools and techniques, we envision that a Knowledge Discovery and Data Mining System (KDDMS) will have to support and optimize for the following scenarios: 1) Sequence of Queries: A user may analyze one or more datasets by issuing a sequence of related complex mining queries, and 2) Multiple Simultaneous Queries: Several users may be analyzing a set of datasets concurrently, and may issue related complex queries.This paper presents a systematic mechanism to optimize for the above cases, targeting the class of mining queries involving frequent pattern mining on one or multiple datasets. We present a system architecture and propose new algorithms to simultaneously optimize multiple such queries and use a knowledgeable cache to store and utilize the past query results. We have implemented and evaluated our system with both real and synthetic datasets. Our experimental results show that our techniques can achieve a speedup of up to a factor of 9, compared with the systems which do not support caching or optimize for multiple queries.