On clustering large number of data streams

  • Authors:
  • Zaher Al Aghbari;Ibrahim Kamel;Thuraya Awad

  • Affiliations:
  • Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates;Department of Electrical and Computer Engineering, University of Sharjah, Sharjah, United Arab Emirates;Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data streams and their applications appear in several fields such as physics, finance, medicine, environmental science, etc. As sensor technology improves, sensor data rates continue to increase. Consequently, analyzing data streams becomes ever more challenging. Fast online response is a must for applications that involve multiple data streams, especially when the number of data streams is large. This paper proposes an efficient clustering technique called Multi-way Grid-based join algorithm MG-join to find clusters in multiple data streams. The proposed algorithm uses a Discrete Fourier Transformation DFT to reduce the dimensionality of the streams. Each stream is represented by a point in a multi-dimensional grid in the frequency domain. The MG-join algorithm finds the different clusters in multiple data streams in the frequency domain. Moreover, this paper proposes an incremental update mechanism to avoid the recalculation of DFT coefficients when new readings arrive and thus minimizes the processing time. Experiments on synthetic data streams show that the proposed clustering technique is much faster than traditional clustering techniques and yet its accuracy is as good as that of the traditional clustering techniques. This makes the proposed technique suitable for sensors network environment where computing and power capabilities are limited.