Bootstrap technique in cluster analysis
Pattern Recognition
Algorithms for clustering data
Algorithms for clustering data
A Classification EM algorithm for clustering and two stochastic versions
Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
Identifying genuine clusters in a classification
Computational Statistics & Data Analysis
Concept decompositions for large sparse text data using clustering
Machine Learning
Cluster analysis: a further approach based on density estimation
Computational Statistics & Data Analysis
Clustering Algorithms
Uniformity Testing Using Minimal Spanning Tree
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Text Mining with Information-Theoretic Clustering
Computing in Science and Engineering
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
On a new multivariate two-sample test
Journal of Multivariate Analysis
Ensembles of Partitions via Data Resampling
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Stability-based validation of clustering solutions
Neural Computation
Resampling Method for Unsupervised Estimation of Cluster Validity
Neural Computation
Scale-based clustering using the radial basis function network
IEEE Transactions on Neural Networks
A linguistic approach to classification of bacterial genomes
Pattern Recognition
A randomized algorithm for estimating the number of clusters
Automation and Remote Control
MiniMax ε-stable cluster validity index for Type-2 fuzziness
Information Sciences: an International Journal
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Self-learning K-means clustering: a global optimization approach
Journal of Global Optimization
A binomial noised model for cluster validation
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Recent Advances in Soft Computing: Theories and Applications
Hi-index | 0.01 |
In the current paper we present a method for assessing cluster stability. This method, combined with a clustering algorithm, yields an estimate of the data partition, namely, the number of clusters. We adopt the cluster stability standpoint where clusters are imagined as islands of ''high'' density in a sea of ''low'' density. Explicitly, a cluster is associated with its high density core. Our approach offers to evaluate the goodness of a cluster by the similarity amongst the entire cluster and its core. We propose to measure this resemblance by two-sample tests or by probability distances between appropriate probability distributions. The distances are calculated on clustered samples drawn from the source population according to two different distributions. The first law is the underlying set distribution. The second law is constructed so that it represents the clusters' cores. Here, a variant of the k-nearest neighbor density estimation is applied, so that items belonging to cores have a much higher chance to be selected. As the sample distribution is unknown a distribution-free two-sample test is required to examine the mentioned correspondence. For constructing such a test, we use distance functions built on negative definite kernels. In practice, outliers in the samples and limitations of the clustering algorithm heavily contribute to the noise level. As a result of this shortcoming the distance values have to be determined for many pairs of samples and therefore an empirical distance's distribution is obtained. The distribution is dependent on the examined number of clusters. To prevent this property for biasing the results we normalize the distances. It is conjectured that the true number of clusters yields the most concentrated normalized distribution. To measure the concentration we use the sample mean and the sample 25th percentile. The paper exhibits the good performance of the proposed method on synthetic and real-world data.