Assessing clustering reliability and features informativeness by random permutations

  • Authors:
  • Michele Ceccarelli;Antonio Maratea

  • Affiliations:
  • Research Centre On Software Technology, University of Sannio, Benevento, Italy;Research Centre On Software Technology, University of Sannio, Benevento, Italy

  • Venue:
  • KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Assessing the quality of a clustering's outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters' number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality. Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what's the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.