BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Clustering through ranking on manifolds
ICML '05 Proceedings of the 22nd international conference on Machine learning
A survey of kernel and spectral methods for clustering
Pattern Recognition
Constrained locally weighted clustering
Proceedings of the VLDB Endowment
Non-negative matrix factorization for semi-supervised data clustering
Knowledge and Information Systems
ANGEL: a new effective and efficient hybrid clustering technique for large databases
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
KIDBSCAN: a new efficient data clustering algorithm
ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Using TF-IDF to hide sensitive itemsets
Applied Intelligence
Statistical user model supported by R-Tree structure
Applied Intelligence
Hi-index | 0.00 |
Rapid technological advances imply that the amount of data stored in databases is rising very fast. However, data mining can discover helpful implicit information in large databases. How to detect the implicit and useful information with lower time cost, high correctness, high noise filtering rate and fit for large databases is of priority concern in data mining, specifying why considerable clustering schemes have been proposed in recent decades. This investigation presents a new data clustering approach called PHD, which is an enhanced version of KIDBSCAN. PHD is a hybrid density-based algorithm, which partitions the data set by K-means, and then clusters the resulting partitions with IDBSCAN. Finally, the closest pairs of clusters are merged until the natural number of clusters of data set is reached. Experimental results reveal that the proposed algorithm can perform the entire clustering, and efficiently reduce the run-time cost. They also indicate that the proposed new clustering algorithm conducts better than several existing well-known schemes such as the K-means, DBSCAN, IDBSCAN and KIDBSCAN algorithms. Consequently, the proposed PHD algorithm is efficient and effective for data clustering in large databases.