Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets

Authors:
Kun-Che Lu;Don-Lin Yang
Affiliations:
Department of Information Engineering and Computer Science, Feng Chia University, 100 Wen Hwa Road, Taichung, Taiwan. E-mail: kjlu@selab.iecs.fcu.edu.tw/ dlyang@fcu.edu.tw;Department of Information Engineering and Computer Science, Feng Chia University, 100 Wen Hwa Road, Taichung, Taiwan. E-mail: kjlu@selab.iecs.fcu.edu.tw/ dlyang@fcu.edu.tw
Venue:
Fundamenta Informaticae - Intelligent Data Analysis in Granular Computing
Year:
2010

Citing 23
Cited 0

CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Sublinear time approximate clustering

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Approximate clustering via core-sets

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Redefining Clustering for High-Dimensional Applications

IEEE Transactions on Knowledge and Data Engineering
An Efficient Fuzzy C-Means Clustering Algorithm

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Adaptive dimension reduction for clustering high dimensional data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
A Human-Computer Interactive Method for Projected Clustering

IEEE Transactions on Knowledge and Data Engineering
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Optimal Time Bounds for Approximate Clustering

Machine Learning
An Improved Cluster Labeling Method for Support Vector Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Novel Kernel Method for Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Dynamic Cluster Formation Using Level Set Methods

IEEE Transactions on Pattern Analysis and Machine Intelligence
Dynamic Characterization of Cluster Structures for Robust and Inductive Support Vector Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Grid-Clustering: An Efficient Hierarchical Clustering Method for Very Large Data Sets

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
A survey of fuzzy clustering algorithms for pattern recognition. I

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A survey of fuzzy clustering algorithms for pattern recognition. II

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Density-based clustering with topographic maps

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is useful for mining the underlying structure of a dataset in order to support decision making since target or high-risk groups can be identified. However, for high dimensional datasets, the result of traditional clustering methods can be meaningless as clusters may only be depicted with respect to a small part of features. Taking customer datasets as an example, certain customers may correlate with their salary and education, and the others may correlate with their job and house location. If one uses all the features of a customer for clustering, these local-correlated clusters may not be revealed. In addition, processing high dimensions and large datasets is a challenging problem in decision making. Searching all the combinations of every feature with every record to extract local-correlated clusters is infeasible, which is in exponential scale in terms of data dimensionality and cardinality. In this paper, we propose a scalable 2-Leveled Approximated Hyper-Image-based Clustering framework, referred as 2L-HIC-A, for mining local-correlated clusters, where each level clustering process requires only one scan of the original dataset. Moreover, the data-processing time of 2L-HIC-A can be independent of the input data size. In 2L-HIC-A, various well-developed image processing techniques can be exploited for mining clusters. In stead of proposing a new clustering algorithm, our framework can accommodate other clustering methods for mining local-corrected clusters, and to shed new light on the existing clustering techniques.