BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Computing depth contours of bivariate point clouds
Computational Statistics & Data Analysis - Special issue on classification
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental Clustering for Mining in a Data Warehousing Environment
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Maintaining variance and k-medians over data stream windows
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice
IEEE Transactions on Knowledge and Data Engineering
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
The CQL continuous query language: semantic foundations and query execution
The VLDB Journal — The International Journal on Very Large Data Bases
Online clustering of parallel data streams
Data & Knowledge Engineering
Adaptive Clustering for Multiple Evolving Streams
IEEE Transactions on Knowledge and Data Engineering
Online outlier detection in sensor data using non-parametric models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Density-based clustering for real-time stream data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Detecting distance-based outliers in streams of data
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detection of Local Outlier over Dynamic Data Streams Using Efficient Partitioning Method
CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 04
Outlier detection in graph streams
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Clustering distributed data streams in peer-to-peer environments
Information Sciences: an International Journal
A similarity-based approach for data stream classification
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many application domains. However, little research effort has been made toward detecting patterns within sliding window semantics as required by real-time monitoring tasks, ranging from real time traffic monitoring to stock trend analysis. Applying static pattern detection algorithms from scratch to every window is impractical due to their high algorithmic complexity and the real-time responsiveness required by streaming applications. In this work, we develop methods for the incremental detection of neighbor-based patterns, in particular, density-based clusters and distance-based outliers over sliding stream windows. Incremental computation for pattern detection queries is challenging. This is because purging of to-be-expired data from previously formed patterns may cause birth, shrinkage, splitting or termination of these complex patterns. To overcome this, we exploit the ''predictability'' property of sliding windows to elegantly discount the effect of expired objects with little maintenance cost. Our solution achieves guaranteed minimal CPU consumption, while keeping the memory utilization linear in the number of objects in the window. To thoroughly analyze the performance of our proposed methods, we develop a cost model characterizing the performance of our proposed neighbor-based pattern mining strategies. We conduct an analysis study to not only identify the key performance factors for each strategy but also show under which conditions each of them are most efficient. Our comprehensive experimental study, using both synthetic and real data from domains of moving object monitoring and stock trades, demonstrates superiority of our proposed strategies over alternate methods in both CPU processing resources and in memory utilization.