Gradual Clustering Algorithms

  • Authors:
  • Fei Wu;Georges Gardarin

  • Affiliations:
  • -;-

  • Venue:
  • DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: Clustering is one of the important techniques in Data Mining. The objective of clustering is to group objects into clusters such that objects within a cluster are more similar to each other than objects in different clusters. The similarity between two objects is defined by a distance function, e.g., the Euclidean distance, which satisfies the triangular inequality. Distance calculation is computationally very expensive and many algorithms have been proposed so far to solve this problem. This paper considers gradual clustering problem. From practice, we noticed that user often begin clustering on a small number of attributes, e.g., two. If the result is partially satisfying, user will continue clustering on a higher number of attributes, e.g., ten. We refer to this problem as gradual clustering problem. In fact gradual clustering can be considered as vertically incremental clustering. Approaches are proposed to solve this problem. The main idea is to reduce the number of distance calculations by using the triangle inequality. Our method first stores in an index the distances between a representative object and objects in n-dimensional space. Then these pre-computed distances are used to avoid distance calculations in (n+m)-dimensional space. Two experiments on real data sets demonstrate the added value of our approaches. The implemented algorithms are based on DBSCAN algorithm with an associated M-Tree as index tree. However the principles of our idea can well be integrated with other tree structures such as MVP-Tree, R*-Tree, etc., and with other clustering algorithms.