Re-examination of interestingness measures in pattern mining: a unified framework

Authors:
Tianyi Wu;Yuguo Chen;Jiawei Han
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA;Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, USA;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, USA
Venue:
Data Mining and Knowledge Discovery
Year:
2010

Citing 14
Cited 14

Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Knowledge Discovery and Measures of Interest

Knowledge Discovery and Measures of Interest
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
Mining for Strong Negative Associations in a Large Database of Customer Transactions

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Mining of Constrained Correlated Sets

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
CoMine: Efficient Mining of Correlated Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Discovering complex matchings across web query interfaces: a correlation mining approach

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Frequent pattern mining: current status and future directions

Data Mining and Knowledge Discovery
Association Mining in Large Databases: A Re-examination of Its Measures

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases

Mining advisor-advisee relationships from research publication networks

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Bridging conjunctive and disjunctive search spaces for mining a new concise and exact representation of correlated patterns

DS'10 Proceedings of the 13th international conference on Discovery science
Efficient mining of top correlated patterns based on null-invariant measures

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Mining flipping correlations from large datasets with taxonomies

Proceedings of the VLDB Endowment
Individual doctor recommendation model on medical social network

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
An association rule analysis framework for complex physiological and genetic data

HIS'12 Proceedings of the First international conference on Health Information Science
Mining event logs to support workflow resource allocation

Knowledge-Based Systems
Efficient mining of correlated sequential patterns based on null hypothesis

Proceedings of the 2012 international workshop on Web-scale knowledge representation, retrieval and reasoning
Mining popular patterns from transactional databases

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Profiling risk sensibility through association rules

Expert Systems with Applications: An International Journal
Confirmation measures of association rule interestingness

Knowledge-Based Systems
Misleading Generalized Itemset discovery

Expert Systems with Applications: An International Journal
Efficient frequent pattern mining based on Linear Prefix tree

Knowledge-Based Systems
Behavior-based clustering and analysis of interestingness measures for association rule mining

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Numerous interestingness measures have been proposed in statistics and data mining to assess object relationships. This is especially important in recent studies of association or correlation pattern mining. However, it is still not clear whether there is any intrinsic relationship among many proposed measures, and which one is truly effective at gauging object relationships in large data sets. Recent studies have identified a critical property, null-(transaction) invariance, for measuring associations among events in large data sets, but many measures do not have this property. In this study, we re-examine a set of null-invariant interestingness measures and find that they can be expressed as the generalized mathematical mean, leading to a total ordering of them. Such a unified framework provides insights into the underlying philosophy of the measures and helps us understand and select the proper measure for different applications. Moreover, we propose a new measure called Imbalance Ratio to gauge the degree of skewness of a data set. We also discuss the efficient computation of interesting patterns of different null-invariant interestingness measures by proposing an algorithm, GAMiner, which complements previous studies. Experimental evaluation verifies the effectiveness of the unified framework and shows that GAMiner speeds up the state-of-the-art algorithm by an order of magnitude.