Validity index for clusters of different sizes and densities

Authors:
Krista Rizman alik;Borut alik
Affiliations:
University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova 17, SI-2000 Maribor, Slovenia;University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova 17, SI-2000 Maribor, Slovenia
Venue:
Pattern Recognition Letters
Year:
2011

Citing 15
Cited 5

Algorithms for clustering data

Algorithms for clustering data
A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data clustering: a review

ACM Computing Surveys (CSUR)
Data Mining Techniques: For Marketing, Sales, and Customer Support

Data Mining Techniques: For Marketing, Sales, and Customer Support
Performance Evaluation of Some Clustering Algorithms and Validity Indices

IEEE Transactions on Pattern Analysis and Machine Intelligence
New indices for cluster validity assessment

Pattern Recognition Letters
An objective approach to cluster validation

Pattern Recognition Letters
A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment

Pattern Recognition Letters
A density-based cluster validity approach using multi-representatives

Pattern Recognition Letters
An efficient k'-means clustering algorithm

Pattern Recognition Letters
A symmetry based multiobjective clustering technique for automatic evolution of clusters

Pattern Recognition
Numerical methods for fuzzy clustering

Information Sciences: an International Journal
Cluster validity index for estimation of fuzzy clusters of different sizes and densities

Pattern Recognition
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Relevance learning in unsupervised vector quantization based on divergences

WSOM'11 Proceedings of the 8th international conference on Advances in self-organizing maps
Fuzzy neural gas for unsupervised vector quantization

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
An extensive comparative study of cluster validity indices

Pattern Recognition
Scalable fine-grained behavioral clustering of HTTP-based malware

Computer Networks: The International Journal of Computer and Telecommunications Networking
Fuzzy and hard clustering analysis for thyroid disease

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.10

Visualization

Abstract

Cluster validity indices are used to validate results of clustering and to find a set of clusters that best fits natural partitions for given data set. Most of the previous validity indices have been considerably dependent on the number of data objects in clusters, on cluster centroids and on average values. They have a tendency to ignore small clusters and clusters with low density. Two cluster validity indices are proposed for efficient validation of partitions containing clusters that widely differ in sizes and densities. The first proposed index exploits a compactness measure and a separation measure, and the second index is based an overlap measure and a separation measure. The compactness and the overlap measures are calculated from few data objects of a cluster while the separation measure uses all data objects. The compactness measure is calculated only from data objects of a cluster that are far enough away from the cluster centroids, while the overlap measure is calculated from data objects that are enough near to one or more other clusters. A good partition is expected to have low degree of overlap and a larger separation distance and compactness. The maximum value of the ratio of compactness to separation and the minimum value of the ratio of overlap to separation indicate the optimal partition. Testing of both proposed indices on some artificial and three well-known real data sets showed the effectiveness and reliability of the proposed indices.