A binomial noised model for cluster validation

Authors:
Dvora Toledano-Kitai;Renata Avros;Zeev Volkovich;Gerhard-Wilhelm Weber;Orly Yahalom
Affiliations:
Software Engineering Department, ORT Braude College of Engineering, Karmiel, Israel;Software Engineering Department, ORT Braude College of Engineering, Karmiel, Israel;Software Engineering Department, ORT Braude College of Engineering, Karmiel, Israel;Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey and University of Siegen, Siegen, Germany and University of Aveiro, Aveiro, Portugal and Universiti Teknologi Mal ...;Software Engineering Department, ORT Braude College of Engineering, Karmiel, Israel
Venue:
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Recent Advances in Soft Computing: Theories and Applications
Year:
2013

Citing 14
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Data clustering: a review

ACM Computing Surveys (CSUR)
Cluster analysis: a further approach based on density estimation

Computational Statistics & Data Analysis
Clustering Algorithms

Clustering Algorithms
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Stability-based validation of clustering solutions

Neural Computation
How Many Clusters? An Information-Theoretic Perspective

Neural Computation
Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)

Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
A statistical model of cluster stability

Pattern Recognition
NP-hardness of Euclidean sum-of-squares clustering

Machine Learning
On a Minimal Spanning Tree Approach in the Cluster Validation Problem

Informatica
Scale-based clustering using the radial basis function network

IEEE Transactions on Neural Networks
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Special issue recent advances in soft computing: Theories and applications

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Recent Advances in Soft Computing: Theories and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal “true” number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implement this idea, we draw pairs of independent equal sized samples, where one sample in any pair is drawn from the data source and the other one is drawn from a noised version thereof. We then run the same clustering method on both samples in any pair and test the similarity between the obtained partitions using a general k-Nearest Neighbor Binomial model. These similarity measurements enable us to estimate the correct number of clusters. A series of numerical experiments on both synthetic and real world data demonstrates the high capability of the offered discipline compared to other methods. In particular, the use of a noised data set is shown to produce significantly better results than in the case of using two independent samples which are both drawn from the data source.