A clustering algorithm for multiple data streams based on spectral component similarity

Authors:
Ling Chen;Ling-Jun Zou;Li Tu
Affiliations:
Department of Computer Science, Yangzhou University, Yangzhou 225009, China and State Key Lab of Novel Software Tech, Nanjing University, Nanjing 210093, China;Department of Computer Science, Yangzhou University, Yangzhou 225009, China;Department of Computer Science, Jiangyin Polytechnic Institute, Jiangyin 214405, China
Venue:
Information Sciences: an International Journal
Year:
2012

Citing 38
Cited 9

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continually evaluating similarity-based pattern queries on a streaming time series

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical grid-based clustering over data streams

ACM SIGMOD Record
Measuring correlation between microarray time-series data using dominant spectral component

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Efficient processing of similarity search under time warping in sequence databases: an index-based approach

Information Systems - Databases: Creation, management and utilization
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Exact indexing of dynamic time warping

Knowledge and Information Systems
BRAID: stream mining through group lag correlations

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Dominant spectral component analysis for transcriptional regulations using microarray time-series data

Bioinformatics
Online clustering of parallel data streams

Data & Knowledge Engineering
Adaptive Clustering for Multiple Evolving Streams

IEEE Transactions on Knowledge and Data Engineering
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Extensions of vector quantization for incremental clustering

Pattern Recognition
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
GAKREM: A novel hybrid clustering algorithm

Information Sciences: an International Journal
Clustering high dimensional data: A graph-based relaxed optimization approach

Information Sciences: an International Journal
A hybrid fuzzy-statistical clustering approach for estimating the time of changes in fixed and variable sampling control charts

Information Sciences: an International Journal
Accelerating fuzzy clustering

Information Sciences: an International Journal
Performance evaluation of density-based clustering methods

Information Sciences: an International Journal
A method of relational fuzzy clustering based on producing feature vectors using FastMap

Information Sciences: an International Journal
Towards supporting expert evaluation of clustering results using a data mining process model

Information Sciences: an International Journal
Anomaly intrusion detection by clustering transactional audit streams in a host computer

Information Sciences: an International Journal
Validation of overlapping clustering: A random clustering perspective

Information Sciences: an International Journal
A framework for clustering categorical time-evolving data

IEEE Transactions on Fuzzy Systems
Efficient matching and retrieval of gene expression time series data based on spectral information

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part III

SMART: Stream Monitoring enterprise Activities by RFID Tags

Information Sciences: an International Journal
Mining frequent patterns in a varying-size sliding window of online transactional data streams

Information Sciences: an International Journal
From model-based control to data-driven control: Survey, classification and perspective

Information Sciences: an International Journal
Clustering local frequency items in multiple databases

Information Sciences: an International Journal
Fuzzy partition based soft subspace clustering and its applications in high dimensional data

Information Sciences: an International Journal
Uncovering overlapping cluster structures via stochastic competitive learning

Information Sciences: an International Journal
Intelligent jamming region division with machine learning and fuzzy optimization for control of robot's part micro-manipulative task

Information Sciences: an International Journal
Mining Top-K Rank Frequent Patterns in Data Streams A Tree Based Approach with Ternary Function and Ternary Feature Vector

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Mining top-k frequent patterns over data streams sliding window

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.07

Visualization

Abstract

We propose a new algorithm to cluster multiple and parallel data streams using spectral component similarity analysis, a new similarity metric. This new algorithm can effectively cluster data streams that show similar behaviour to each other but with unknown time delays. The algorithm performs auto-regressive modelling to measure the lag correlation between the data streams and uses it as the distance metric for clustering. The algorithm uses a sliding window model to continuously report the most recent clustering results and to dynamically adjust the number of clusters. Our experimental results on real and synthetic datasets show that our algorithm has better clustering quality, efficiency, and stability than other existing methods.