Multiple hypothesis testing in pattern discovery

  • Authors:
  • Sami Hanhijärvi

  • Affiliations:
  • Department of Information and Computer Science, Aalto University, Finland

  • Venue:
  • DS'11 Proceedings of the 14th international conference on Discovery science
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used in a generic data mining setting. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive). We show the power of our solution on real data.