A comprehensive validity index for clustering

Authors:
S. Saitta;B. Raphael;I. F. C. Smith
Affiliations:
(Correspd. Tel.: +41 21 693 63 72/ Fax: +41 21 693 47 48/ E-mail: sandro.saitta@gmail.com) Ecole Polytechnique Fé/dé/rale de Lausanne (EPFL), Station 18, Lausanne, Switzerland;National University of Singapore, 117566, Singapore;Ecole Polytechnique Fé/dé/rale de Lausanne (EPFL), Station 18, Lausanne, Switzerland
Venue:
Intelligent Data Analysis
Year:
2008

Citing 30
Cited 3

Algorithms for clustering data

Algorithms for clustering data
On finding the number of clusters

Pattern Recognition Letters
Data clustering: a review

ACM Computing Surveys (CSUR)
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
Cluster validity methods: part I

ACM SIGMOD Record
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Clustering validity checking methods: part II

ACM SIGMOD Record
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Performance Evaluation of Some Clustering Algorithms and Validity Indices

IEEE Transactions on Pattern Analysis and Machine Intelligence
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cluster validation techniques for genome expression data

Signal Processing - Special issue: Genomic signal processing
A new cluster validity measure and its application to image compression

Pattern Analysis & Applications
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Landscape of Clustering Algorithms

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Evaluation and optimization of clustering in gene expression data analysis

Bioinformatics
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Clustering versus faceted categories for information exploration

Communications of the ACM - Supporting exploratory search
New indices for cluster validity assessment

Pattern Recognition Letters
A New Cluster Validity for Data Clustering

Neural Processing Letters
Text mining without document context

Information Processing and Management: an International Journal - Special issue: Informetrics
An objective approach to cluster validation

Pattern Recognition Letters
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Model-based evaluation of clustering validation measures

Pattern Recognition
Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering

Intelligent Data Analysis
A Tabu Clustering algorithm for Intrusion Detection

Intelligent Data Analysis
Algorithms for clustering high dimensional and distributed data

Intelligent Data Analysis
An unsupervised clustering approach for leukaemia classification based on DNA micro-arrays data

Intelligent Data Analysis
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Clustering of the self-organizing map

IEEE Transactions on Neural Networks
Survey of clustering algorithms

IEEE Transactions on Neural Networks

An effective evaluation measure for clustering on evolving data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A two-stage genetic algorithm for automatic clustering

Neurocomputing
A two-leveled symbiotic evolutionary algorithm for clustering problems

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster validity indices are used for both estimating the quality of a clustering algorithm and for determining the correct number of clusters in data. Even though several indices exist in the literature, most of them are only relevant for data sets that contain at least two clusters. This paper introduces a new bounded index for cluster validity called the score function (SF), a double exponential expression that is based on a ratio of standard cluster parameters. Several artificial and real-life data sets are used to evaluate the performance of the score function. These data sets contain a range of features and patterns such as unbalanced, overlapped and noisy clusters. In addition, cases involving sub-clusters and perfect clusters are tested. The score function is tested against six previously proposed validity indices. In the case of hyper-spheroidal clusters, the index proposed in this paper is found to be always as good or better than these indices. In addition, it is shown to work well on multidimensional and noisy data sets. One of its advantages is the ability to handle single cluster case and sub-cluster hierarchies.