PHD: an efficient data clustering scheme using partition space technique for knowledge discovery in large databases

Authors:
Cheng-Fa Tsai;Heng-Fu Yeh;Jui-Fang Chang;Ning-Han Liu
Affiliations:
Department of Management Information Systems, National Pingtung University of Science and Technology, Pingtung, Taiwan 91201;Department of Management Information Systems, National Pingtung University of Science and Technology, Pingtung, Taiwan 91201;Department of International Business, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan 80778;Department of Management Information Systems, National Pingtung University of Science and Technology, Pingtung, Taiwan 91201
Venue:
Applied Intelligence
Year:
2010

Citing 11
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
ROCK: A Robust Clustering Algorithm for Categorical Attributes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clustering through ranking on manifolds

ICML '05 Proceedings of the 22nd international conference on Machine learning
A survey of kernel and spectral methods for clustering

Pattern Recognition
Constrained locally weighted clustering

Proceedings of the VLDB Endowment
Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems
ANGEL: a new effective and efficient hybrid clustering technique for large databases

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
KIDBSCAN: a new efficient data clustering algorithm

ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing

Using TF-IDF to hide sensitive itemsets

Applied Intelligence
Statistical user model supported by R-Tree structure

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rapid technological advances imply that the amount of data stored in databases is rising very fast. However, data mining can discover helpful implicit information in large databases. How to detect the implicit and useful information with lower time cost, high correctness, high noise filtering rate and fit for large databases is of priority concern in data mining, specifying why considerable clustering schemes have been proposed in recent decades. This investigation presents a new data clustering approach called PHD, which is an enhanced version of KIDBSCAN. PHD is a hybrid density-based algorithm, which partitions the data set by K-means, and then clusters the resulting partitions with IDBSCAN. Finally, the closest pairs of clusters are merged until the natural number of clusters of data set is reached. Experimental results reveal that the proposed algorithm can perform the entire clustering, and efficiently reduce the run-time cost. They also indicate that the proposed new clustering algorithm conducts better than several existing well-known schemes such as the K-means, DBSCAN, IDBSCAN and KIDBSCAN algorithms. Consequently, the proposed PHD algorithm is efficient and effective for data clustering in large databases.