Evaluating the intrinsic dimension of evolving data streams
Proceedings of the 2006 ACM symposium on Applied computing
A fast and effective method to find correlations among attributes in databases
Data Mining and Knowledge Discovery
Fractal dimension applied to plant identification
Information Sciences: an International Journal
Multifractal-based cluster hierarchy optimisation algorithm
International Journal of Business Intelligence and Data Mining
Measuring evolving data streams' behavior through their intrinsic dimension
New Generation Computing
K-means clustering versus validation measures: a data-distribution perspective
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Augmenting transportation-related recommendations through data mining
International Journal of Advanced Intelligence Paradigms
A modified fuzzy c-means algorithm for association rules clustering
ICIC'06 Proceedings of the 2006 international conference on Intelligent computing: Part II
Hi-index | 0.00 |
Clustering is a widely used knowledge discovery technique. It helps uncovering structures in data that were not previously known. The clustering of large data sets has received a lot of attention in recent years, however, clustering is a still a challenging task since many published algorithms fail to do well in scaling with the size of the data set and the number of dimensions that describe the points, or in finding arbitrary shapes of clusters, or dealing effectively with the presence of noise. In this paper, we present a new clustering algorithm, based in self-similarity properties of the data sets. Self-similarity is the property of being invariant with respect to the scale used to look at the data set. While fractals are self-similar at every scale used to look at them, many data sets exhibit self-similarity over a range of scales. Self-similarity can be measured using the fractal dimension. The new algorithm which we call Fractal Clustering (FC) places points incrementally in the cluster for which the change in the fractal dimension after adding the point is the least. This is a very natural way of clustering points, since points in the same cluster have a great degree of self-similarity among them (and much less self-similarity with respect to points in other clusters). FC requires one scan of the data, is suspendable at will, providing the best answer possible at that point, and is incremental. We show via experiments that FC effectively deals with large data sets, high-dimensionality and noise and is capable of recognizing clusters of arbitrary shape.