A shared execution strategy for multiple pattern mining requests over streaming data

Authors:
Di Yang;Elke A. Rundensteiner;Matthew O. Ward
Affiliations:
Worcester Polytechnic Institute;Worcester Polytechnic Institute;Worcester Polytechnic Institute
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 13
Cited 9

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient computation of the skyline cube

VLDB '05 Proceedings of the 31st international conference on Very large data bases
On-the-fly sharing for streamed aggregation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The CQL continuous query language: semantic foundations and query execution

The VLDB Journal — The International Journal on Very Large Data Bases
State-slice: new paradigm of multi-query optimization of window-based stream queries

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Scheduling for shared window joins over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Resource sharing in continuous sliding-window aggregates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The case for precision sharing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Neighbor-based pattern detection for windows over streaming data

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Interactive visual exploration of neighbor-based patterns in data streams

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An optimal strategy for monitoring top-k queries in streaming windows

Proceedings of the 14th International Conference on Extending Database Technology
High-performance composite event monitoring system supporting large numbers of queries and sources

Proceedings of the 5th ACM international conference on Distributed event-based system
Efficient processing of multiple DTW queries in time series databases

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Efficient algorithms for collaborative decision making for large scale settings

Proceedings of the 3rd international workshop on Collaborative information retrieval
Summarization and matching of density-based clusters in streaming environments

Proceedings of the VLDB Endowment
Shared execution strategy for neighbor-based pattern mining requests over streaming windows

ACM Transactions on Database Systems (TODS)
FIRE: interactive visual support for parameter space-driven rule mining

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Mining and linking patterns across live data streams and stream archives

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.01

Visualization

Abstract

In diverse applications ranging from stock trading to traffic monitoring, popular data streams are typically monitored by multiple analysts for patterns of interest. These analysts may submit similar pattern mining requests, such as cluster detection queries, yet customized with different parameter settings. In this work, we present an efficient shared execution strategy for processing a large number of density-based cluster detection queries with arbitrary parameter settings. Given the high algorithmic complexity of the clustering process and the real-time responsiveness required by streaming applications, serving multiple such queries in a single system is extremely resource intensive. The naive method of detecting and maintaining clusters for different queries independently is often in-feasible in practice, as its demands on system resources increase dramatically with the cardinality of the query workload. To overcome this, we analyze the interrelations between the cluster sets identified by queries with different parameters settings, including both pattern-specific and window-specific parameters. We introduce the notion of the growth property among the cluster sets identified by different queries, and characterize the conditions under which it holds. By exploiting this growth property we propose a uniform solution, called Chandi, which represents identified cluster sets as one single compact structure and performs integrated maintenance on them -- resulting in significant sharing of computational and memory resources. Our comprehensive experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that Chandi is on average four times faster than the best alternative methods, while using 85% less memory space in our test cases. It also shows that Chandi scales in handling large numbers of queries on the order of hundreds or even thousands under high input data rates.