Controlling false positives in association rule mining

Authors:
Guimei Liu;Haojun Zhang;Limsoon Wong
Affiliations:
National University of Singapore;National University of Singapore;National University of Singapore
Venue:
Proceedings of the VLDB Endowment
Year:
2011

Citing 11
Cited 2

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Detecting Group Differences: Mining Contrast Sets

Data Mining and Knowledge Discovery
Discovering Frequent Closed Itemsets for Association Rules

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Interestingness measures for data mining: A survey

ACM Computing Surveys (CSUR)
CFP-tree: A compact disk-based structure for storing and querying frequent itemsets

Information Systems
Discovering Significant Patterns

Machine Learning
Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Machine Learning
An efficient rigorous approach for identifying statistically significant frequent itemsets

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

From Association Analysis to Causal Discovery

Proceedings of Workshop on Machine Learning for Sensory Data Analysis
Detection of daily living activities using a two-stage Markov model

Journal of Ambient Intelligence and Smart Environments - Intelligent agents in Ambient Intelligence and smart environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

Association rule mining is an important problem in the data mining area. It enumerates and tests a large number of rules on a dataset and outputs rules that satisfy user-specified constraints. Due to the large number of rules being tested, rules that do not represent real systematic effect in the data can satisfy the given constraints purely by random chance. Hence association rule mining often suffers from a high risk of false positive errors. There is a lack of comprehensive study on controlling false positives in association rule mining. In this paper, we adopt three multiple testing correction approaches---the direct adjustment approach, the permutation-based approach and the holdout approach---to control false positives in association rule mining, and conduct extensive experiments to study their performance. Our results show that (1) Numerous spurious rules are generated if no correction is made. (2) The three approaches can control false positives effectively. Among the three approaches, the permutation-based approach has the highest power of detecting real association rules, but it is very computationally expensive. We employ several techniques to reduce its cost effectively.