A methodology to find clusters in the data based on Shannon's entropy and genetic algorithms

Authors:
Edwyn Aldana-Bobadilla;Angel Kuri-Morales
Affiliations:
Instituto de Investigaciones en Matemáticas Aplicadas y Sistemas, Universidad Nacional Autónoma de México, Ciudad Universitaria, Mexico City, Mexico;Instituto Tecnológico Autónomo de México, Mexico City, Mexico
Venue:
ACELAE'11 Proceedings of the 10th WSEAS international conference on communications, electrical & computer engineering, and 9th WSEAS international conference on Applied electromagnetics, wireless and optical communications
Year:
2011

Citing 9
Cited 0

What Makes a Problem Hard for a Genetic Algorithm? Some Anomalous Results and Their Explanation

Machine Learning - Special issue on genetic algorithms
Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms

Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms
Self-organizing maps

Self-organizing maps
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
On Clustering Validation Techniques

Journal of Intelligent Information Systems
A Methodology for the Statistical Characterization of Genetic Algorithms

MICAI '02 Proceedings of the Second Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Penalty Function Methods for Constrained Optimization with Genetic Algorithms: A Statistical Analysis

MICAI '02 Proceedings of the Second Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Information theoretic measures for clusterings comparison: is a correction for chance necessary?

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Finding irregularly shaped clusters based on entropy

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects

Quantified Score

Hi-index	0.00

Visualization

Abstract

The most common clustering methods are based on metrics that allow the determination of the similarity between elements of a given data set. This similarity allows us to divide the data set into subsets (clusters) that contain "highly similar" elements. The use of a metric imposes two constraints. First, the shape of the found clusters is generally hyper-spherical (in the space of the metric) due to the fact that each element in a cluster lies within a radial distance relative to a given center. Second, the metric may be sensitive to the probability density function of the data set. Following this fact several methods based on statistical approaches have become an attractive and powerful option. These involve the estimation of the probability density function (pdf) of the data set which minimizes an optimality criterion. Generally this is a highly non-linear and usually non-convex optimization problem which disallows the use of traditional optimization techniques. In this paper we propose a statistical method based on Shannon's Conditional Entropy which uses a rugged genetic algorithm to find the optimal pdf. Each individual of the Genetic Algorithm is a possible solution of a clustering problem. The fitness of an individual is determined by Shannon's entropy encoded in its genome and an additional constraint related to the "quality" of this solution. The "quality" is measured through a validity index of the clustering process. A novel and important aspect of our method is the form of representation of the objects of the data set in order to reduce the computational complexity due to the high dimensionality. We show that our proposal has high effectiveness relative to methods as k-means, fuzzy c-means and Kohonen Maps with a synthetic data set.