The KDD process for extracting useful knowledge from volumes of data
Communications of the ACM
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
Clustering Algorithms
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
Data Mining and Knowledge Discovery
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
BANG-Clustering: A Novel Grid-Clustering Algorithm for Huge Data Sets
SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
FGKA: a Fast Genetic K-means Clustering Algorithm
Proceedings of the 2004 ACM symposium on Applied computing
Automated Variable Weighting in k-Means Type Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
An overview of evolutionary algorithms in multiobjective optimization
Evolutionary Computation
Scalability problems of simple genetic algorithms
Evolutionary Computation
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Clustering by integrating multi-objective optimization with weighted k-means and validity analysis
IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Clustering with a genetically optimized approach
IEEE Transactions on Evolutionary Computation
A fast and elitist multiobjective genetic algorithm: NSGA-II
IEEE Transactions on Evolutionary Computation
On combining multiple clusterings: an overview and a new perspective
Applied Intelligence
A review: accuracy optimization in clustering ensembles using genetic algorithms
Artificial Intelligence Review
From alternative clustering to robust clustering and its application to gene expression data
IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
A two-leveled symbiotic evolutionary algorithm for clustering problems
Applied Intelligence
Dynamic clustering using combinatorial particle swarm optimization
Applied Intelligence
Statistical user model supported by R-Tree structure
Applied Intelligence
Hi-index | 0.00 |
This paper applies divide and conquer approach in an iterative way to handle the clustering process. The target is a parallelized effective and efficient approach that produces the intended clustering result. We achieve scalability by first partitioning a large dataset into subsets of manageable sizes based on the specifications of the machine to be used in the clustering process; then cluster the partitions separately in parallel. The centroid of each obtained cluster is treated like the root of a tree with instances in its cluster as leaves. The partitioning and clustering process is iteratively applied on the centroids with the trees growing up until we get the final clustering; the outcome is a forest with one tree per cluster. Finally, a conquer process is performed to get the actual intended clustering, where each instance (leaf node) belongs to the final cluster represented by the root of its tree. We use multi-objective genetic algorithm combined with validity indices to decide on the number of classes. This approach fits well for interactive online clustering. It facilitates for incremental clustering because chunks of instances are clustered as stand alone sets, and then the results are merged with existing clusters. This is attractive and feasible because we consider the clustering of only centroids after the first clustering stage. The reported test results demonstrate the applicability and effectiveness of the proposed approach.