Neighbor-based pattern detection for windows over streaming data

Authors:
Di Yang;Elke A. Rundensteiner;Matthew O. Ward
Affiliations:
Worcester Polytechnic Institute, Worcester, MA;Worcester Polytechnic Institute, Worcester, MA;Worcester Polytechnic Institute, Worcester, MA
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 17
Cited 12

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Computing depth contours of bivariate point clouds

Computational Statistics & Data Analysis - Special issue on classification
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
The CQL continuous query language: semantic foundations and query execution

The VLDB Journal — The International Journal on Very Large Data Bases
Online clustering of parallel data streams

Data & Knowledge Engineering
Adaptive Clustering for Multiple Evolving Streams

IEEE Transactions on Knowledge and Data Engineering
Online outlier detection in sensor data using non-parametric models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Resource sharing in continuous sliding-window aggregates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

A shared execution strategy for multiple pattern mining requests over streaming data

Proceedings of the VLDB Endowment
Interactive visual exploration of neighbor-based patterns in data streams

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
An optimal strategy for monitoring top-k queries in streaming windows

Proceedings of the 14th International Conference on Extending Database Technology
CLUES: a unified framework supporting interactive exploration of density-based clusters in streams

Proceedings of the 20th ACM international conference on Information and knowledge management
Summarization and matching of density-based clusters in streaming environments

Proceedings of the VLDB Endowment
Shared execution strategy for neighbor-based pattern mining requests over streaming windows

ACM Transactions on Database Systems (TODS)
Dense subgraph maintenance under streaming edge weight updates for real-time story identification

Proceedings of the VLDB Endowment
AnyOut: anytime outlier detection on streaming data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Probabilistic distance based abnormal pattern detection in uncertain series data

Knowledge-Based Systems
Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Mining and linking patterns across live data streams and stream archives

Proceedings of the VLDB Endowment
A framework of traveling companion discovery on trajectory data streams

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many domains. However, pattern detection with sliding window semantics, as required by applications ranging from stock market analysis to moving object tracking remains largely unexplored. Applying static pattern detection algorithms from scratch to every window is prohibitively expensive due to their high algorithmic complexity. This work tackles this problem by developing the first solution for incremental detection of neighbor-based patterns specific to sliding window scenarios. The specific pattern types covered in this work include density-based clusters and distance-based outliers. Incremental pattern computation in highly dynamic streaming environments is challenging, because purging a large amount of to-be-expired data from previously formed patterns may cause complex pattern changes including migration, splitting, merging and termination of these patterns. Previous incremental neighbor-based pattern detection algorithms, which were typically not designed to handle sliding windows, such as incremental DBSCAN, are not able to solve this problem efficiently in terms of both CPU and memory consumption. To overcome this, we exploit the "predictability" property of sliding windows to elegantly discount the effect of expiring objects on the remaining pattern structures. Our solution achieves minimal CPU utilization, while still keeping the memory utilization linear in the number of objects in the window. Our comprehensive experimental study, using both synthetic as well as real data from domains of stock trades and moving object monitoring, demonstrates superiority of our proposed strategies over alternate methods in both CPU and memory utilization.