A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
The buddy tree: an efficient and robust access method for spatial data base
Proceedings of the sixteenth international conference on Very large databases
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data bubbles: quality preserving performance boosting for hierarchical clustering
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
G-Tree: A New Data Structure for Organizing Multidimensional Data
IEEE Transactions on Knowledge and Data Engineering
Data Mining: An Overview from a Database Perspective
IEEE Transactions on Knowledge and Data Engineering
A Region Splitting Strategy for Physical Database Design of Multidimensional File Organizations
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Clustering Large Datasets in Arbitrary Metric Spaces
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Grid-Clustering: An Efficient Hierarchical Clustering Method for Very Large Data Sets
ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Accelerating k-medoid-based algorithms through metric access methods
Journal of Systems and Software
Mining Meaningful Student Groups Based on Communication History Records
KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
Hi-index | 0.01 |
Clustering on large databases has been studied actively as an increasing number of applications involve huge amount of data. In this paper, we propose an efficient top-down approach for density-based clustering, which is based on the density information stored in index codes of a multidimensional index. We first provide a formal definition of the cluster based on the concept of region contrast partition. Based on this notion, we propose a novel top-down clustering algorithm, which improves the efficiency through branch-and-bourd pruning. For this pruning, we present a technique for determining the bounds based on sparse and dense internal regions and formally prove the correctness of the bounds. Experimental results show that the proposed method reduces the elapsed time by up to 96 times compared with that of BIRCH, which is a well-known clustering method. The results also show that the performance improvement becomes more marked as the size of the database increases.