A top-down approach for density-based clustering using multidimensional indexes

  • Authors:
  • Jae-Joon Hwang;Kyu-Young Whang;Yang-Sae Moon;Byung-Suk Lee

  • Affiliations:
  • Department of Computer Science and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, 373-1, Kusong-Dong, Yusong-Gu, Daejeon 305-701, South Korea;Department of Computer Science and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, 373-1, Kusong-Dong, Yusong-Gu, Daejeon 305-701, South Korea;Department of Computer Science and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, 373-1, Kusong-Dong, Yusong-Gu, Daejeon 305-701, South Korea;Department of Computer Science, University of Vermont, Burlington, VT

  • Venue:
  • Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

Clustering on large databases has been studied actively as an increasing number of applications involve huge amount of data. In this paper, we propose an efficient top-down approach for density-based clustering, which is based on the density information stored in index codes of a multidimensional index. We first provide a formal definition of the cluster based on the concept of region contrast partition. Based on this notion, we propose a novel top-down clustering algorithm, which improves the efficiency through branch-and-bourd pruning. For this pruning, we present a technique for determining the bounds based on sparse and dense internal regions and formally prove the correctness of the bounds. Experimental results show that the proposed method reduces the elapsed time by up to 96 times compared with that of BIRCH, which is a well-known clustering method. The results also show that the performance improvement becomes more marked as the size of the database increases.