Algorithms for clustering data
Algorithms for clustering data
Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
A variable-length genetic algorithm for clustering and classification
Pattern Recognition Letters - Special issue on genetic algorithms
Genetic algorithms + data structures = evolution programs (3rd ed.)
Genetic algorithms + data structures = evolution programs (3rd ed.)
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Computational geometry: algorithms and applications
Computational geometry: algorithms and applications
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
Data mining: concepts and techniques
Data mining: concepts and techniques
Genetic Algorithms and Grouping Problems
Genetic Algorithms and Grouping Problems
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Mining Very Large Databases with Parallel Processing
Mining Very Large Databases with Parallel Processing
Data Mining: Introductory and Advanced Topics
Data Mining: Introductory and Advanced Topics
A Monte Carlo algorithm for fast projective clustering
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Mining and Knowledge Discovery with Evolutionary Algorithms
Data Mining and Knowledge Discovery with Evolutionary Algorithms
Information Theory: Coding Theorems for Discrete Memoryless Systems
Information Theory: Coding Theorems for Discrete Memoryless Systems
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
WaveCluster: a wavelet-based clustering approach for spatial data in very large databases
The VLDB Journal — The International Journal on Very Large Data Bases
O-Cluster: Scalable Clustering of Large High Dimensional Data Sets
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Discovering patterns in spatial data using evolutionary programming
GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
Mining comprehensible clustering rules with an evolutionary algorithm
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
Hybrid genetic algorithms are better for spatial clustering
PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Clustering with a genetically optimized approach
IEEE Transactions on Evolutionary Computation
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A survey of evolutionary algorithms for clustering
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Clustering with XCS and agglomerative rule merging
IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Efficiency issues of evolutionary k-means
Applied Soft Computing
Immunodomaince based Clonal Selection Clustering Algorithm
Applied Soft Computing
Hi-index | 0.00 |
Clustering is a descriptive data mining task aiming to group the data into homogeneous groups. This paper presents a novel evolutionary algorithm (NOCEA) that efficiently and effectively clusters massive numerical databases. NOCEA evolves individuals of variable-length consisting of disjoint and axis-aligned hyper-rectangular rules with homogeneous data distribution. The antecedent part of the rules includes an interval-like condition for each dimension. A novel quantisation algorithm imposes a regular multi-dimensional grid structure onto the data space to reduce the search combinations. Due to quantisation the boundaries of the intervals are encoded as integer values. The evolutionary search is guided by a simple data coverage maximisation function. The enormous data space is effectively explored by task-specific recombination and mutation operators producing candidate solutions with no overlapping rules. A parsimony generalisation operator shortens the discovered knowledge by replacing adjacent rules with more generic ones. NOCEA employs a special homogeneity operator that enforces quasi-uniform data distribution in the space enclosed by the candidate rules. After convergence the discovered knowledge undergoes simplification to perform subspace clustering, and to assemble the clusters. Results using real-world datasets are included to show that NOCEA has several attractive properties for clustering including: (a) comprehensible output in the form of disjoint and homogeneous rules, (b) the ability to discover clusters of arbitrary shape, density, size, and data coverage, (c) ability to perform effective subspace clustering, (d) near linear scalability with the database size, data and cluster dimensionality, and (e) substantial potential for task parallelism (speedup of 13.8 on 16 processors). A real-world example is a detailed study of the seismicity along the African-Eurasian-Arabian plate boundaries.