BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ANGEL: a new effective and efficient hybrid clustering technique for large databases
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
KIDBSCAN: a new efficient data clustering algorithm
ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Hi-index | 0.00 |
The rapid progress of information technology has led to increasing amounts of data produced and stored in databases. How to extract the implicit and useful information with lower time cost and high correctness is of priority concern in data mining, explaining why many clustering methods have been developed in recent decades. This work presents a new clustering algorithm named NPUST, which is an enhanced version of KIDBSCAN. NPUST is a hybrid density-based approach, which partitions the dataset using K-means, and then clusters the resulting partitions with IDBSCAN. Finally, the closest pairs of clusters are merged until the natural number of clusters of dataset is reached. Experimental results indicate that the proposed algorithm can handle the entire cluster, and efficiently lower the run-time cost. They also reveal that the proposed new clustering algorithm performs better than several existing well-known approaches such as the K-means, DBSCAN, IDBSCAN and KIDBSCAN algorithms. Consequently, the proposed NPUST algorithm is efficient and effective for data clustering.