Mining neighbor-based patterns in data streams

Authors:
Di Yang;Elke A. Rundensteiner;Matthew O. Ward
Affiliations:
1 Oracle Dr, Nashua, NH 03062, United States;WPI, United States;WPI, United States
Venue:
Information Systems
Year:
2013

Citing 22
Cited 1

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Computing depth contours of bivariate point clouds

Computational Statistics & Data Analysis - Special issue on classification
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Clustering for Mining in a Data Warehousing Environment

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
The CQL continuous query language: semantic foundations and query execution

The VLDB Journal — The International Journal on Very Large Data Bases
Online clustering of parallel data streams

Data & Knowledge Engineering
Adaptive Clustering for Multiple Evolving Streams

IEEE Transactions on Knowledge and Data Engineering
Online outlier detection in sensor data using non-parametric models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Detecting distance-based outliers in streams of data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detection of Local Outlier over Dynamic Data Streams Using Efficient Partitioning Method

CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 04
Outlier detection in graph streams

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Clustering distributed data streams in peer-to-peer environments

Information Sciences: an International Journal

A similarity-based approach for data stream classification

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many application domains. However, little research effort has been made toward detecting patterns within sliding window semantics as required by real-time monitoring tasks, ranging from real time traffic monitoring to stock trend analysis. Applying static pattern detection algorithms from scratch to every window is impractical due to their high algorithmic complexity and the real-time responsiveness required by streaming applications. In this work, we develop methods for the incremental detection of neighbor-based patterns, in particular, density-based clusters and distance-based outliers over sliding stream windows. Incremental computation for pattern detection queries is challenging. This is because purging of to-be-expired data from previously formed patterns may cause birth, shrinkage, splitting or termination of these complex patterns. To overcome this, we exploit the ''predictability'' property of sliding windows to elegantly discount the effect of expired objects with little maintenance cost. Our solution achieves guaranteed minimal CPU consumption, while keeping the memory utilization linear in the number of objects in the window. To thoroughly analyze the performance of our proposed methods, we develop a cost model characterizing the performance of our proposed neighbor-based pattern mining strategies. We conduct an analysis study to not only identify the key performance factors for each strategy but also show under which conditions each of them are most efficient. Our comprehensive experimental study, using both synthetic and real data from domains of moving object monitoring and stock trades, demonstrates superiority of our proposed strategies over alternate methods in both CPU processing resources and in memory utilization.