Efficient mining of top correlated patterns based on null-invariant measures

Authors:
Sangkyum Kim;Marina Barsky;Jiawei Han
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;Simon Fraser University, BC, Canada;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Year:
2011

Citing 13
Cited 3

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Alternative Interest Measures for Mining Associations in Databases

IEEE Transactions on Knowledge and Data Engineering
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Selecting the right interestingness measure for association patterns

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CoMine: Efficient Mining of Correlated Patterns

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Mining compressed frequent-pattern sets

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Finding Associations and Computing Similarity via Biased Pair Sampling

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
iTopicModel: Information Network-Integrated Topic Modeling

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Re-examination of interestingness measures in pattern mining: a unified framework

Data Mining and Knowledge Discovery
Scaling up top-K cosine similarity search

Data & Knowledge Engineering

New exact concise representation of rare correlated patterns: application to intrusion detection

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
A constrained frequent pattern mining system for handling aggregate constraints

Proceedings of the 16th International Database Engineering & Applications Sysmposium
Efficient mining of correlated sequential patterns based on null hypothesis

Proceedings of the 2012 international workshop on Web-scale knowledge representation, retrieval and reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining strong correlations from transactional databases often leads to more meaningful results than mining association rules. In such mining, null (transaction)-invariance is an important property of the correlation measures. Unfortunately, some useful null-invariant measures such as Kulczynski and Cosine, which can discover correlations even for the very unbalanced cases, lack the (anti)-monotonicity property. Thus, they could only be applied to frequent itemsets as the post-evaluation step. For large datasets and for low supports, this approach is computationally prohibitive. This paper presents new properties for all known null-invariant measures. Based on these properties, we develop efficient pruning techniques and design the Apriori-like algorithm NICOMINER for mining strongly correlated patterns directly. We develop both the threshold-bounded and the top-k variations of the algorithm, where top-k is used when the optimal correlation threshold is not known in advance and to give user control over the output size. We test NICOMINER on real-life datasets from different application domains, using Cosine as an example of the null-invariant correlation measure. We show that NICOMINER outperforms support-based approach more than an order of magnitude, and that it is very useful for discovering top correlations in itemsets with low support.