Information-theoretic clustering: A representative and evolutionary approach

  • Authors:
  • Daniel AraúJo;Adriao DóRia Neto;Allan Martins

  • Affiliations:
  • Federal Rural University of Semi-Arido, Campus Angicos, Angicos-RN, Brazil and Federal University of Rio Grande do Norte, Department of Computer Engineering and Automation, Natal-RN, Brazil;Federal University of Rio Grande do Norte, Department of Computer Engineering and Automation, Natal-RN, Brazil;Federal University of Rio Grande do Norte, Department of Computer Engineering and Automation, Natal-RN, Brazil

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 12.05

Visualization

Abstract

This paper proposes a new perspective on non-parametric entropy-based clustering. We developed a new cost evaluation function for clustering that measures the cross information potential (CIP) between clusters on a dataset using representative points, which we called representative CIP (rCIP). We did this based on the idea that optimizing the cross information potential is equivalent to minimizing cross entropy between clusters. Our measure is different because, instead of using all points in a dataset, it uses only representative points to quantify the interaction between distributions without any loss of the original properties of cross information potential. This brings a double advantage: decreases the computational cost of computing the cross information potential, thus drastically reducing the running time, and uses the underlying statistics of the space region where representative points are in order to measure interaction. With this, created a useful non-parametric estimator of entropy and makes possible using cross information potential in applications where it was not. Due to the nature of clustering problems, we proposed a genetic algorithm in order to use rCIP as cost function. We ran several tests and compared the results with single linkage hierarchical algorithm, finite mixture of Gaussians and spectral clustering in both synthetic and real image segmentation datasets. Experiments showed that our approach achieved better results compared to the other algorithms and it was capable of capture the real structure of the data in most cases regardless of its complexity. It also produced good image segmentation with the advantage of a tuning parameter that provides a way of refining segmentation.