New probabilistic interest measures for association rules

Authors:
Michael Hahsler;Kurt Hornik
Affiliations:
(Correspd. Tel.: +43(1)31336/6081/ Fax: +43(1)31336/739/ E-mail: hahsler@ai.wu-wien.ac.at) Vienna University of Economics and Business Administration, Augasse 2-6, A-1090 Vienna, Austria;Vienna University of Economics and Business Administration, Augasse 2-6, A-1090 Vienna, Austria
Venue:
Intelligent Data Analysis
Year:
2007

Citing 18
Cited 6

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Probabilistic models in cluster analysis

Computational Statistics & Data Analysis - Special issue on classification
A new framework for itemset generation

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Mining the most interesting rules

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining association rules with multiple minimum supports

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Data mining for association rules and sequential patterns: sequential and parallel algorithms

Data mining for association rules and sequential patterns: sequential and parallel algorithms
Robust Classification for Imprecise Environments

Machine Learning
Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond Market Baskets: Generalizing Association Rules to Dependence Rules

Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data

IEEE Transactions on Knowledge and Data Engineering
Mining Strong Affinity Association Patterns in Data Sets with Skewed Support Distribution

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Advances in frequent itemset mining implementations: report on FIMI'03

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
ROC `n' Rule Learning—Towards a Better Understanding of Covering Algorithms

Machine Learning
A Model-Based Frequency Constraint for Mining Associations from Transaction Data

Data Mining and Knowledge Discovery

TARtool: A Temporal Dataset Generator for Market Basket Analysis

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Models for association rules based on clustering and correlation

Intelligent Data Analysis
A study on interestingness measures for associative classifiers

Proceedings of the 2010 ACM Symposium on Applied Computing
The arules R-Package Ecosystem: Analyzing Interesting Patterns from Large Transaction Data Sets

The Journal of Machine Learning Research
Discovering frequent pattern pairs

Intelligent Data Analysis
Behavior-based clustering and analysis of interestingness measures for association rule mining

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. We start this paper with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic.