A Grid and Fractal Dimension-Based Data Stream Clustering Algorithm

  • Authors:
  • Guopin Lin;Leisong Chen

  • Affiliations:
  • -;-

  • Venue:
  • ISISE '08 Proceedings of the 2008 International Symposium on Information Science and Engieering - Volume 01
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The data stream problem has been studied extensively in recent years. This is because the great in collection of the nature of data stream. The nature of stream data makes it essential to use algorithms which require only one pass over the data. And single-scan, stream analysis methods have been proposed in this context. However,Clustering is still a challenging task since many published algorithms fail to do well in scaling with the size of the data stream sets and the number of dimensions that describe the point, or in finding arbitrary shapes of clusters, or dealing effectively with the presence of noise. In this paper, we propose a new data stream clustering approach, called GFDStream (A Grid and Fractal Dimension-Based Data Stream Clustering). The method incorporates a Grid method, and the Fractal Clustering methodology [1]. This clustering idea in [2] which divide the clustering process into an online component which periodically stores detailed summary statistics and an offline component which uses only this summary statistics and concepts of a pyramidal time frame in conjunction with a micro-clustering approach. The idea uses the fractal dimension in the Grids as a parameter, and deals with the data space by gridding, which can improve the processing speed of the algorithm. Since points in the same cluster have a great degree of self-similarity among them.(and much less self-similarity with respect to points in other clusters), it can distinct the data better. We show via experiments that GFDStream effectively deals with data stream and is capable of recognizing clusters of arbitrary shape.