A Bounded Index for Cluster Validity

Authors:
Sandro Saitta;Benny Raphael;Ian F. Smith
Affiliations:
Ecole Polytechnique Fédérale de Lausanne (EPFL), Station 18, 1015 Lausanne, Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Station 18, 1015 Lausanne, Switzerland;Ecole Polytechnique Fédérale de Lausanne (EPFL), Station 18, 1015 Lausanne, Switzerland
Venue:
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2007

Citing 13
Cited 3

Algorithms for clustering data

Algorithms for clustering data
Data clustering: a review

ACM Computing Surveys (CSUR)
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Performance Evaluation of Some Clustering Algorithms and Validity Indices

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cluster validation techniques for genome expression data

Signal Processing - Special issue: Genomic signal processing
A new cluster validity measure and its application to image compression

Pattern Analysis & Applications
New indices for cluster validity assessment

Pattern Recognition Letters
Text mining without document context

Information Processing and Management: an International Journal - Special issue: Informetrics
An objective approach to cluster validation

Pattern Recognition Letters
Alarm clustering for intrusion detection systems in computer networks

Engineering Applications of Artificial Intelligence
Acquisition of concept descriptions by conceptual clustering

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Search Engine Query Clustering Using Top-k Search Results

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
A two-stage genetic algorithm for automatic clustering

Neurocomputing
An extensive comparative study of cluster validity indices

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster validity, called the score function (SF), is introduced. The score function is based on standard cluster properties. Several artificial and real-life data sets are used to evaluate the performance of the score function. The score function is tested against four existing validity indices. The index proposed in this paper is found to be always as good or better than these indices in the case of hyperspheroidal clusters. It is shown to work well on multi-dimensional data sets and is able to accommodate unique and sub-cluster cases.