Assessing clustering reliability and features informativeness by random permutations

Authors:
Michele Ceccarelli;Antonio Maratea
Affiliations:
Research Centre On Software Technology, University of Sannio, Benevento, Italy;Research Centre On Software Technology, University of Sannio, Benevento, Italy
Venue:
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
Year:
2007

Citing 3
Cited 2

Neural networks for pattern recognition

Neural networks for pattern recognition
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Classification of digital terrain models through fuzzy clustering: an application

WILF'03 Proceedings of the 5th international conference on Fuzzy Logic and Applications

A Fuzzy Extension of Some Classical Concordance Measures and an Efficient Algorithm for Their Computation

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Concordance indices for comparing fuzzy, possibilistic, rough and grey partitions

International Journal of Knowledge Engineering and Soft Data Paradigms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Assessing the quality of a clustering's outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters' number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality. Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what's the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.