BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Incremental Clustering for Mining in a Data Warehousing Environment
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Maintaining variance and k-medians over data stream windows
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient computation of the skyline cube
VLDB '05 Proceedings of the 31st international conference on Very large data bases
On-the-fly sharing for streamed aggregation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The CQL continuous query language: semantic foundations and query execution
The VLDB Journal — The International Journal on Very Large Data Bases
State-slice: new paradigm of multi-query optimization of window-based stream queries
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Density-based clustering for real-time stream data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Scheduling for shared window joins over data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Resource sharing in continuous sliding-window aggregates
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The case for precision sharing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Neighbor-based pattern detection for windows over streaming data
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Interactive visual exploration of neighbor-based patterns in data streams
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An optimal strategy for monitoring top-k queries in streaming windows
Proceedings of the 14th International Conference on Extending Database Technology
High-performance composite event monitoring system supporting large numbers of queries and sources
Proceedings of the 5th ACM international conference on Distributed event-based system
Efficient processing of multiple DTW queries in time series databases
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Efficient algorithms for collaborative decision making for large scale settings
Proceedings of the 3rd international workshop on Collaborative information retrieval
Summarization and matching of density-based clusters in streaming environments
Proceedings of the VLDB Endowment
Shared execution strategy for neighbor-based pattern mining requests over streaming windows
ACM Transactions on Database Systems (TODS)
FIRE: interactive visual support for parameter space-driven rule mining
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Mining and linking patterns across live data streams and stream archives
Proceedings of the VLDB Endowment
Hi-index | 0.01 |
In diverse applications ranging from stock trading to traffic monitoring, popular data streams are typically monitored by multiple analysts for patterns of interest. These analysts may submit similar pattern mining requests, such as cluster detection queries, yet customized with different parameter settings. In this work, we present an efficient shared execution strategy for processing a large number of density-based cluster detection queries with arbitrary parameter settings. Given the high algorithmic complexity of the clustering process and the real-time responsiveness required by streaming applications, serving multiple such queries in a single system is extremely resource intensive. The naive method of detecting and maintaining clusters for different queries independently is often in-feasible in practice, as its demands on system resources increase dramatically with the cardinality of the query workload. To overcome this, we analyze the interrelations between the cluster sets identified by queries with different parameters settings, including both pattern-specific and window-specific parameters. We introduce the notion of the growth property among the cluster sets identified by different queries, and characterize the conditions under which it holds. By exploiting this growth property we propose a uniform solution, called Chandi, which represents identified cluster sets as one single compact structure and performs integrated maintenance on them -- resulting in significant sharing of computational and memory resources. Our comprehensive experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that Chandi is on average four times faster than the best alternative methods, while using 85% less memory space in our test cases. It also shows that Chandi scales in handling large numbers of queries on the order of hundreds or even thousands under high input data rates.