A clustering algorithm for multiple data streams based on spectral component similarity

  • Authors:
  • Ling Chen;Ling-Jun Zou;Li Tu

  • Affiliations:
  • Department of Computer Science, Yangzhou University, Yangzhou 225009, China and State Key Lab of Novel Software Tech, Nanjing University, Nanjing 210093, China;Department of Computer Science, Yangzhou University, Yangzhou 225009, China;Department of Computer Science, Jiangyin Polytechnic Institute, Jiangyin 214405, China

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

We propose a new algorithm to cluster multiple and parallel data streams using spectral component similarity analysis, a new similarity metric. This new algorithm can effectively cluster data streams that show similar behaviour to each other but with unknown time delays. The algorithm performs auto-regressive modelling to measure the lag correlation between the data streams and uses it as the distance metric for clustering. The algorithm uses a sliding window model to continuously report the most recent clustering results and to dynamically adjust the number of clusters. Our experimental results on real and synthetic datasets show that our algorithm has better clustering quality, efficiency, and stability than other existing methods.