Finding irregularly shaped clusters based on entropy

Authors:
Angel Kuri-Morales;Edwin Aldana-Bobadilla
Affiliations:
Department of Computation, Autonomous Technological Institute of Mexico, Mexico City, Mexico;Institute of Research in Applied Mathematics and Systems, Autonomous University of Mexico, Mexico City, Mexico
Venue:
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Year:
2010

Citing 6
Cited 1

What Makes a Problem Hard for a Genetic Algorithm? Some Anomalous Results and Their Explanation

Machine Learning - Special issue on genetic algorithms
Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms

Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms
Self-Organizing Maps

Self-Organizing Maps
On Clustering Validation Techniques

Journal of Intelligent Information Systems
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

A methodology to find clusters in the data based on Shannon's entropy and genetic algorithms

ACELAE'11 Proceedings of the 10th WSEAS international conference on communications, electrical & computer engineering, and 9th WSEAS international conference on Applied electromagnetics, wireless and optical communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data clustering the more traditional algorithms are based on similarity criteria which depend on a metric distance. This fact imposes important constraints on the shape of the clusters found. These shapes generally are hyperspherical in the metric's space due to the fact that each element in a cluster lies within a radial distance relative to a given center. In this paper we propose a clustering algorithm that does not depend on simple distance metrics and, therefore, allows us to find clusters with arbitrary shapes in n-dimensional space. Our proposal is based on some concepts stemming from Shannon's information theory and evolutionary computation. Here each cluster consists of a subset of the data where entropy is minimized. This is a highly non-linear and usually nonconvex optimization problem which disallows the use of traditional optimization techniques. To solve it we apply a rugged genetic algorithm (the so-called Vasconcelos' GA). In order to test the efficiency of our proposal we artificially created several sets of data with known properties in a tridimensional space. The result of applying our algorithm has shown that it is able to find highly irregular clusters that traditional algorithms cannot. Some previous work is based on algorithms relying on similar approaches (such as ENCLUS' and CLIQUE's). The differences between such approaches and ours are also discussed.