NPUST: An Efficient Clustering Algorithm Using Partition Space Technique for Large Databases

Authors:
Cheng-Fa Tsai;Heng-Fu Yeh
Affiliations:
Department of Management Information Systems, National Pingtung University of Science and Technology, Pingtung, Taiwan 91201;Department of Management Information Systems, National Pingtung University of Science and Technology, Pingtung, Taiwan 91201
Venue:
IEA/AIE '09 Proceedings of the 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems: Next-Generation Applied Intelligence
Year:
2009

Citing 7
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ANGEL: a new effective and efficient hybrid clustering technique for large databases

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
KIDBSCAN: a new efficient data clustering algorithm

ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid progress of information technology has led to increasing amounts of data produced and stored in databases. How to extract the implicit and useful information with lower time cost and high correctness is of priority concern in data mining, explaining why many clustering methods have been developed in recent decades. This work presents a new clustering algorithm named NPUST, which is an enhanced version of KIDBSCAN. NPUST is a hybrid density-based approach, which partitions the dataset using K-means, and then clusters the resulting partitions with IDBSCAN. Finally, the closest pairs of clusters are merged until the natural number of clusters of dataset is reached. Experimental results indicate that the proposed algorithm can handle the entire cluster, and efficiently lower the run-time cost. They also reveal that the proposed new clustering algorithm performs better than several existing well-known approaches such as the K-means, DBSCAN, IDBSCAN and KIDBSCAN algorithms. Consequently, the proposed NPUST algorithm is efficient and effective for data clustering.