BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Continuously adaptive continuous queries over streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Incremental Clustering for Mining in a Data Warehousing Environment
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Maintaining variance and k-medians over data stream windows
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Multiple aggregations over data streams
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Efficient computation of the skyline cube
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient reverse k-nearest neighbor search in arbitrary metric spaces
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
On-the-fly sharing for streamed aggregation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The CQL continuous query language: semantic foundations and query execution
The VLDB Journal — The International Journal on Very Large Data Bases
Online outlier detection in sensor data using non-parametric models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
State-slice: new paradigm of multi-query optimization of window-based stream queries
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Continuous Nearest Neighbor Queries over Sliding Windows
IEEE Transactions on Knowledge and Data Engineering
Density-based clustering for real-time stream data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Scheduling for shared window joins over data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Resource sharing in continuous sliding-window aggregates
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Approximate NN queries on streams with guaranteed error/performance bounds
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The case for precision sharing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Detecting distance-based outliers in streams of data
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Near-optimal algorithms for shared filter evaluation in data stream systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The V*-Diagram: a query-dependent approach to moving KNN queries
Proceedings of the VLDB Endowment
Neighbor-based pattern detection for windows over streaming data
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Reverse k-nearest neighbor search in dynamic and general metric databases
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Scalable skyline computation using object-based space partitioning
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A shared execution strategy for multiple pattern mining requests over streaming data
Proceedings of the VLDB Endowment
Interactive visual exploration of neighbor-based patterns in data streams
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
AEC algorithm: a heuristic approach to calculating density-based clustering Eps parameter
ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
Mining and linking patterns across live data streams and stream archives
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In diverse applications ranging from stock trading to traffic monitoring, data streams are continuously monitored by multiple analysts for extracting patterns of interest in real time. These analysts often submit similar pattern mining requests yet customized with different parameter settings. In this work, we present shared execution strategies for processing a large number of neighbor-based pattern mining requests of the same type yet with arbitrary parameter settings. Such neighbor-based pattern mining requests cover a broad range of popular mining query types, including detection of clusters, outliers, and nearest neighbors. Given the high algorithmic complexity of the mining process, serving multiple such queries in a single system is extremely resource intensive. The naive method of detecting and maintaining patterns for different queries independently is often infeasible in practice, as its demands on system resources increase dramatically with the cardinality of the query workload. In order to maximize the efficiency of the system resource utilization for executing multiple queries simultaneously, we analyze the commonalities of the neighbor-based pattern mining queries, and identify several general optimization principles which lead to significant system resource sharing among multiple queries. In particular, as a preliminary sharing effort, we observe that the computation needed for the range query searches (the process of searching the neighbors for each object) can be shared among multiple queries and thus saves the CPU consumption. Then we analyze the interrelations between the patterns identified by queries with different parameters settings, including both pattern-specific and window-specific parameters. For that, we first introduce an incremental pattern representation, which represents the patterns identified by queries with different pattern-specific parameters within a single compact structure. This enables integrated pattern maintenance for multiple queries. Second, by leveraging the potential overlaps among sliding windows, we propose a metaquery strategy which utilizes a single query to answer multiple queries with different window-specific parameters. By combining these three techniques, namely the range query search sharing, integrated pattern maintenance, and metaquery strategy, our framework realizes fully shared execution of multiple queries with arbitrary parameter settings. It achieves significant savings of computational and memory resources due to shared execution. Our comprehensive experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that our solution is significantly faster than the independent execution strategy, while using only a small portion of memory space compared to the independent execution. We also show that our solution scales in handling large numbers of queries in the order of hundreds or even thousands under high input data rates.