Continuous subspace clustering in streaming time series

Authors:
Maria Kontaki;Apostolos N. Papadopoulos;Yannis Manolopoulos
Affiliations:
Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece;Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece;Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece
Venue:
Information Systems
Year:
2008

Citing 21
Cited 2

Fast discovery of association rules

Advances in knowledge discovery and data mining
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Continuous queries over data streams

ACM SIGMOD Record
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Semantic Compression and Pattern Extraction with Fascicles

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering of streaming time series is meaningless

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Fast Algorithm for Subspace Clustering by Pattern Similarity

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Streaming queries over streaming data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Leadership discovery when data correlatively evolve

World Wide Web
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performing data mining tasks in streaming data is considered a challenging research direction, due to the continuous data evolution. In this work, we focus on the problem of clustering streaming time series, based on the sliding window paradigm. More specifically, we use the concept of subspace @a-clusters. A subspace @a-cluster consists of a set of streams, whose value difference is less than @a in a consecutive number of time instances (dimensions). The clusters can be continuously and incrementally updated as the streaming time series evolve with time. The proposed technique is based on a careful examination of pair-wise stream similarities for a subset of dimensions and then it is generalized for more streams per cluster. Additionally, we extend our technique in order to find maximal pClusters in consecutive dimensions that have been used in previously proposed clustering methods. Performance evaluation results, based on real-life and synthetic data sets, show that the proposed method is more efficient than existing techniques. Moreover, it is shown that the proposed pruning criteria are very important for search space reduction, and that the cost of incremental cluster monitoring is more computationally efficient that the re-clustering process.