A top-down approach for density-based clustering using multidimensional indexes

Authors:
Jae-Joon Hwang;Kyu-Young Whang;Yang-Sae Moon;Byung-Suk Lee
Affiliations:
Department of Computer Science and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, 373-1, Kusong-Dong, Yusong-Gu, Daejeon 305-701, South Korea;Department of Computer Science and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, 373-1, Kusong-Dong, Yusong-Gu, Daejeon 305-701, South Korea;Department of Computer Science and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology, 373-1, Kusong-Dong, Yusong-Gu, Daejeon 305-701, South Korea;Department of Computer Science, University of Vermont, Burlington, VT
Venue:
Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Year:
2004

Citing 16
Cited 2

A linear-time probabilistic counting algorithm for database applications

ACM Transactions on Database Systems (TODS)
The buddy tree: an efficient and robust access method for spatial data base

Proceedings of the sixteenth international conference on Very large databases
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data bubbles: quality preserving performance boosting for hierarchical clustering

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
G-Tree: A New Data Structure for Organizing Multidimensional Data

IEEE Transactions on Knowledge and Data Engineering
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
A Region Splitting Strategy for Physical Database Design of Multidimensional File Organizations

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Clustering Large Datasets in Arbitrary Metric Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Grid-Clustering: An Efficient Hierarchical Clustering Method for Very Large Data Sets

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2

Accelerating k-medoid-based algorithms through metric access methods

Journal of Systems and Software
Mining Meaningful Student Groups Based on Communication History Records

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering on large databases has been studied actively as an increasing number of applications involve huge amount of data. In this paper, we propose an efficient top-down approach for density-based clustering, which is based on the density information stored in index codes of a multidimensional index. We first provide a formal definition of the cluster based on the concept of region contrast partition. Based on this notion, we propose a novel top-down clustering algorithm, which improves the efficiency through branch-and-bourd pruning. For this pruning, we present a technique for determining the bounds based on sparse and dense internal regions and formally prove the correctness of the bounds. Experimental results show that the proposed method reduces the elapsed time by up to 96 times compared with that of BIRCH, which is a well-known clustering method. The results also show that the performance improvement becomes more marked as the size of the database increases.