A Grid and Fractal Dimension-Based Data Stream Clustering Algorithm

Authors:
Guopin Lin;Leisong Chen
Affiliations:
-;-
Venue:
ISISE '08 Proceedings of the 2008 International Symposium on Information Science and Engieering - Volume 01
Year:
2008

Citing 0
Cited 4

Precise anytime clustering of noisy sensor data with logarithmic complexity

Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data
An IFS-based similarity measure to index electroencephalograms

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Handling different categories of concept drifts in data streams using distributed GP

EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Density-Based projected clustering of data streams

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The data stream problem has been studied extensively in recent years. This is because the great in collection of the nature of data stream. The nature of stream data makes it essential to use algorithms which require only one pass over the data. And single-scan, stream analysis methods have been proposed in this context. However,Clustering is still a challenging task since many published algorithms fail to do well in scaling with the size of the data stream sets and the number of dimensions that describe the point, or in finding arbitrary shapes of clusters, or dealing effectively with the presence of noise. In this paper, we propose a new data stream clustering approach, called GFDStream (A Grid and Fractal Dimension-Based Data Stream Clustering). The method incorporates a Grid method, and the Fractal Clustering methodology [1]. This clustering idea in [2] which divide the clustering process into an online component which periodically stores detailed summary statistics and an offline component which uses only this summary statistics and concepts of a pyramidal time frame in conjunction with a micro-clustering approach. The idea uses the fractal dimension in the Grids as a parameter, and deals with the data space by gridding, which can improve the processing speed of the algorithm. Since points in the same cluster have a great degree of self-similarity among them.(and much less self-similarity with respect to points in other clusters), it can distinct the data better. We show via experiments that GFDStream effectively deals with data stream and is capable of recognizing clusters of arbitrary shape.