On a Minimal Spanning Tree Approach in the Cluster Validation Problem

Authors:
Zeev Barzily;Zeev Volkovich;Başak Akteke-Öztürk;Gerhard-Wilhelm Weber
Affiliations:
ORT Braude College of Engineering, 21982 Karmiel, Israel, e-mail: zbarzily@braude.ac.il, vlvolkov@braude.ac.il;ORT Braude College of Engineering, 21982 Karmiel, Israel, e-mail: zbarzily@braude.ac.il, vlvolkov@braude.ac.il;Institute of Applied Mathematics, Middle East Technical University, 06531 Ankara, Turkey, e-mail: bozturk@metu.edu.tr;Institute of Applied Mathematics, Middle East Technical University, 06531 Ankara, Turkey and Faculty of Economics, Business and Law, University Siegen, Hölderlinstrasse 3, 57076 Germany, e-ma ...
Venue:
Informatica
Year:
2009

Citing 8
Cited 7

A Classification EM algorithm for clustering and two stochastic versions

Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
Otakar Borůvka on minimum spanning tree problem translation of both the 1926 papers, comments, history

Discrete Mathematics
Clustering Algorithms

Clustering Algorithms
On minimizing sequences for k-centres

Journal of Approximation Theory
Uniformity Testing Using Minimal Spanning Tree

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Text Mining with Information-Theoretic Clustering

Computing in Science and Engineering
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers
A statistical model of cluster stability

Pattern Recognition

Optimization and Knowledge-Based Technologies

Informatica
A randomized algorithm for estimating the number of clusters

Automation and Remote Control
Learning automata-based algorithms for solving stochastic minimum spanning tree problem

Applied Soft Computing
A learning automata-based heuristic algorithm for solving the minimum spanning tree problem in stochastic graphs

The Journal of Supercomputing
Self-learning K-means clustering: a global optimization approach

Journal of Global Optimization
How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A binomial noised model for cluster validation

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Recent Advances in Soft Computing: Theories and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a method for the study of cluster stability is purposed. We draw pairs of samples from the data, according to two sampling distributions. The first distribution corresponds to the high density zones of data-elements distribution. Thus it is associated with the clusters cores. The second one, associated with the cluster margins, is related to the low density zones. The samples are clustered and the two obtained partitions are compared. The partitions are considered to be consistent if the obtained clusters are similar. The resemblance is measured by the total number of edges, in the clusters minimal spanning trees, connecting points from different samples. We use the Friedman and Rafsky two sample test statistic. Under the homogeneity hypothesis, this statistic is normally distributed. Thus, it can be expected that the true number of clusters corresponds to the statistic empirical distribution which is closest to normal. Numerical experiments demonstrate the ability of the approach to detect the true number of clusters.