Detecting Group Differences: Mining Contrast Sets
Data Mining and Knowledge Discovery
On the discovery of significant statistical quantitative rules
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An Efficient Algorithm for Discovering Frequent Subgraphs
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Significant Patterns
Machine Learning
Assessing data mining results via swap randomization
ACM Transactions on Knowledge Discovery from Data (TKDD)
Tell me something I don't know: randomization strategies for iterative data mining
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Randomization methods for assessing data analysis results on real-valued matrices
Statistical Analysis and Data Mining
Hi-index | 0.00 |
The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used in a generic data mining setting. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive). We show the power of our solution on real data.