Shared execution strategy for neighbor-based pattern mining requests over streaming windows

  • Authors:
  • Di Yang;Elke A. Rundensteiner;Matthew O. Ward

  • Affiliations:
  • WPI;WPI;WPI

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In diverse applications ranging from stock trading to traffic monitoring, data streams are continuously monitored by multiple analysts for extracting patterns of interest in real time. These analysts often submit similar pattern mining requests yet customized with different parameter settings. In this work, we present shared execution strategies for processing a large number of neighbor-based pattern mining requests of the same type yet with arbitrary parameter settings. Such neighbor-based pattern mining requests cover a broad range of popular mining query types, including detection of clusters, outliers, and nearest neighbors. Given the high algorithmic complexity of the mining process, serving multiple such queries in a single system is extremely resource intensive. The naive method of detecting and maintaining patterns for different queries independently is often infeasible in practice, as its demands on system resources increase dramatically with the cardinality of the query workload. In order to maximize the efficiency of the system resource utilization for executing multiple queries simultaneously, we analyze the commonalities of the neighbor-based pattern mining queries, and identify several general optimization principles which lead to significant system resource sharing among multiple queries. In particular, as a preliminary sharing effort, we observe that the computation needed for the range query searches (the process of searching the neighbors for each object) can be shared among multiple queries and thus saves the CPU consumption. Then we analyze the interrelations between the patterns identified by queries with different parameters settings, including both pattern-specific and window-specific parameters. For that, we first introduce an incremental pattern representation, which represents the patterns identified by queries with different pattern-specific parameters within a single compact structure. This enables integrated pattern maintenance for multiple queries. Second, by leveraging the potential overlaps among sliding windows, we propose a metaquery strategy which utilizes a single query to answer multiple queries with different window-specific parameters. By combining these three techniques, namely the range query search sharing, integrated pattern maintenance, and metaquery strategy, our framework realizes fully shared execution of multiple queries with arbitrary parameter settings. It achieves significant savings of computational and memory resources due to shared execution. Our comprehensive experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that our solution is significantly faster than the independent execution strategy, while using only a small portion of memory space compared to the independent execution. We also show that our solution scales in handling large numbers of queries in the order of hundreds or even thousands under high input data rates.