Gradual Clustering Algorithms

Authors:
Fei Wu;Georges Gardarin
Affiliations:
-;-
Venue:
DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
Year:
2001

Citing 0
Cited 4

Stochastic Voting Algorithms for Web Services Group Testing

QSIC '05 Proceedings of the Fifth International Conference on Quality Software
Hierarchical Adaptive Clustering

Informatica
Incremental clustering using a core-based approach

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Adaptive clustering algorithms

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract: Clustering is one of the important techniques in Data Mining. The objective of clustering is to group objects into clusters such that objects within a cluster are more similar to each other than objects in different clusters. The similarity between two objects is defined by a distance function, e.g., the Euclidean distance, which satisfies the triangular inequality. Distance calculation is computationally very expensive and many algorithms have been proposed so far to solve this problem. This paper considers gradual clustering problem. From practice, we noticed that user often begin clustering on a small number of attributes, e.g., two. If the result is partially satisfying, user will continue clustering on a higher number of attributes, e.g., ten. We refer to this problem as gradual clustering problem. In fact gradual clustering can be considered as vertically incremental clustering. Approaches are proposed to solve this problem. The main idea is to reduce the number of distance calculations by using the triangle inequality. Our method first stores in an index the distances between a representative object and objects in n-dimensional space. Then these pre-computed distances are used to avoid distance calculations in (n+m)-dimensional space. Two experiments on real data sets demonstrate the added value of our approaches. The implemented algorithms are based on DBSCAN algorithm with an associated M-Tree as index tree. However the principles of our idea can well be integrated with other tree structures such as MVP-Tree, R*-Tree, etc., and with other clustering algorithms.