Testing the Significance of Patterns in Data with Cluster Structure

Authors:
Niko Vuokko;Petteri Kaski
Affiliations:
-;-
Venue:
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Year:
2010

Citing 0
Cited 1

A framework for evaluating the smoothness of data-mining results

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is one of the basic operations in data analysis, and the cluster structure of a dataset often has a marked effect on observed patterns in data. Testing whether a data mining result is implied by the cluster structure can give substantial information on the formation of the dataset. We propose a new method for empirically testing the statistical significance of patterns in real-valued data in relation to the cluster structure. The method relies on principal component analysis and is based on the general idea of decomposing the data for the purpose of isolating the null model. We evaluate the performance of the method and the information it provides on various real datasets. Our results show that the proposed method is robust and provides nontrivial information about the origin of patterns in data, such as the source of classification accuracy and the observed correlations between attributes.