What Makes a Problem Hard for a Genetic Algorithm? Some Anomalous Results and Their Explanation
Machine Learning - Special issue on genetic algorithms
Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms
Self-Organizing Maps
On Clustering Validation Techniques
Journal of Intelligent Information Systems
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A methodology to find clusters in the data based on Shannon's entropy and genetic algorithms
ACELAE'11 Proceedings of the 10th WSEAS international conference on communications, electrical & computer engineering, and 9th WSEAS international conference on Applied electromagnetics, wireless and optical communications
Hi-index | 0.00 |
In data clustering the more traditional algorithms are based on similarity criteria which depend on a metric distance. This fact imposes important constraints on the shape of the clusters found. These shapes generally are hyperspherical in the metric's space due to the fact that each element in a cluster lies within a radial distance relative to a given center. In this paper we propose a clustering algorithm that does not depend on simple distance metrics and, therefore, allows us to find clusters with arbitrary shapes in n-dimensional space. Our proposal is based on some concepts stemming from Shannon's information theory and evolutionary computation. Here each cluster consists of a subset of the data where entropy is minimized. This is a highly non-linear and usually nonconvex optimization problem which disallows the use of traditional optimization techniques. To solve it we apply a rugged genetic algorithm (the so-called Vasconcelos' GA). In order to test the efficiency of our proposal we artificially created several sets of data with known properties in a tridimensional space. The result of applying our algorithm has shown that it is able to find highly irregular clusters that traditional algorithms cannot. Some previous work is based on algorithms relying on similar approaches (such as ENCLUS' and CLIQUE's). The differences between such approaches and ours are also discussed.