Uniformity Testing Using Minimal Spanning Tree

Authors:
Anil. K. Jain;Fan Xiao
Affiliations:
-;-
Venue:
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Year:
2002

Citing 0
Cited 9

A statistical model of cluster stability

Pattern Recognition
On a Minimal Spanning Tree Approach in the Cluster Validation Problem

Informatica
Automatically finding clusters in normalized cuts

Pattern Recognition
Probabilistic auto-tuning for architectures with complex constraints

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
A multivariate uniformity test for the case of unknown support

Statistics and Computing
Data clustering: a user’s dilemma

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Statistical measures of two dimensional point set uniformity

Computational Statistics & Data Analysis
An empirical study of tests for uniformity in multidimensional data

Computational Statistics & Data Analysis
How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Testing for uniformity of multivariate data is the initial step in exploratory pattern analysis. We propose a new uniformity testing method, which first computes the maximum (standardized) edge length in the MST of the given data. Large lengths indicate the existence of well-separated clusters or outliers in the data. For the data passing this edge inconsistency test, we generate two sub-samples of the data by a weighted resampling method, where the weights are computed based on the normalized edge lengths of MST of the entire data. The uniformity of the data is estimated by running the two-sample MST-test on these two subsamples. Experiments with simulated and real data show the potential of the proposed test in identifying uniform or weakly clustered data. This test can also be used to rank various data sets based on their degree of uniformity.