Use of a fuzzy granulation--degranulation criterion for assessing cluster validity

Authors:
Sanghamitra Bandyopadhyay;Sriparna Saha;Witold Pedrycz
Affiliations:
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India;Image Processing and Modeling, Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, Speyerer Strasse 6, D-69115 Heidelberg, Germany;Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada T6G 2G7 and Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Venue:
Fuzzy Sets and Systems
Year:
2011

Citing 16
Cited 2

Algorithms for clustering data

Algorithms for clustering data
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Modified Version of the K-Means Algorithm with a Distance Based on Cluster Symmetry

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Performance Evaluation of Some Clustering Algorithms and Validity Indices

IEEE Transactions on Pattern Analysis and Machine Intelligence
A new cluster validity measure and its application to image compression

Pattern Analysis & Applications
Fuzzy vector quantization with the particle swarm optimization: A study in fuzzy granulation-degranulation information processing

Signal Processing
GAPS: A clustering method using a new point symmetry-based distance measure

Pattern Recognition
Cluster Analysis

Cluster Analysis
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fuzzy clustering with partial supervision

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A survey of fuzzy clustering algorithms for pattern recognition. II

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A Cluster Validity Measure With Outlier Detection for Support Vector Clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Alpha-Cut Implemented Fuzzy Clustering Algorithms and Switching Regressions

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Granular prototyping in fuzzy clustering

IEEE Transactions on Fuzzy Systems

Developing fast predictors for large-scale time series using fuzzy granular support vector machines

Applied Soft Computing
Assembly unit partitioning for hull structure in shipbuilding

Computer-Aided Design

Quantified Score

Hi-index	0.20

Visualization

Abstract

The identification of a suitable clustering algorithm to partition data and assessment of the validity of the resultant partitioning are ongoing quests in unsupervised learning. In this study, a fuzzy granulation-degranulation criterion is proposed to evaluate the goodness of a fuzzy partitioning of the data. This, in turn, is used to determine the appropriate clustering algorithm suitable for a particular data set. In general, the quality of a partitioning is measured by computing the variance within it, which is a measure of compactness of the obtained partitioning. Here a new error function, which reflects how well the computed cluster centers represent the whole data set, is used as the goodness measure of the obtained partitioning. Thus a clustering algorithm, providing a good set of cluster centers which approximate well the whole data set, is considered to be the most suited. Thereafter this new fuzzy granulation-degranulation criterion is used to develop six new cluster validity indices. These indices mimic the definitions of the existing and well-known cluster validity indices, such as PBM-index, XB-index, PS-index, FS-index, K-index and SV-index, but use the new fuzzy granulation-degranulation based error function instead of cluster compactness. In order to evaluate the effectiveness of the proposed error function in correctly identifying the appropriate clustering algorithm for a particular data set, eight well-known clustering algorithms, K-means, Fuzzy C-means, GAK-means (genetic algorithm based K-means algorithm), a newly developed genetic point symmetry based clustering technique (GAPS-clustering), Average Linkage clustering algorithm, Expectation Maximization (EM) clustering algorithm, Self-Organizing Map (SOM) and Spectral clustering technique are evaluated on a set of six artificially generated and six real-life data sets. Results show that GAK-means is the most appropriate for most of the data sets used for the experiments. Thereafter the effectiveness of the proposed cluster validity indices in identifying the appropriate number of clusters automatically from different data sets are shown for above mentioned 12 data sets. For the purpose of comparison, results obtained with the original versions of the proposed cluster validity indices and results obtained by a density based clustering technique are also presented.