Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs

Authors:
Hui Xiong;Shashi Shekhar;Pang-Ning Tan;Vipin Kumar
Affiliations:
University of Minnesota;University of Minnesota;Michigan State University;University of Minnesota
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 11
Cited 33

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Exploratory mining via constrained frequent set queries

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Empirical bayes screening for multi-item associations

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Constraint-Based Rule Mining in Large, Dense Databases

Data Mining and Knowledge Discovery
Mining Optimized Association Rules with Categorical and Numeric Attributes

IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases

Proceedings of the 17th International Conference on Data Engineering
The Computational Complexity of High-Dimensional Correlation Search

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
DualMiner: a dual-pruning algorithm for itemsets with constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Playing hide-and-seek with correlations

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining

Linear correlation discovery in databases: a data mining approach

Data & Knowledge Engineering
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases

IEEE Transactions on Knowledge and Data Engineering
Assessing data mining results via swap randomization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding highly correlated pairs efficiently with powerful pruning

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Mining top-k strongly correlated item pairs without minimum correlation threshold

International Journal of Knowledge-based and Intelligent Engineering Systems
Efficient mining of weighted interesting patterns with a strong weight and/or support affinity

Information Sciences: an International Journal
Assessing data mining results via swap randomization

ACM Transactions on Knowledge Discovery from Data (TKDD)
Detecting attribute dependencies from query feedback

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Volatile correlation computation: a checkpoint view

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Association Mining in Large Databases: A Re-examination of Its Measures

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Mining non-redundant high order correlations in binary data

Proceedings of the VLDB Endowment
An efficient mining of weighted frequent patterns with length decreasing support constraints

Knowledge-Based Systems
Top-K Correlation Sub-graph Search in Graph Databases

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
On pushing weight constraints deeply into frequent itemset mining

Intelligent Data Analysis
Association rules mining including weak-support modes using novel measures

WSEAS Transactions on Computers
Tight correlated item sets and their efficient discovery

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Mining correlated subgraphs in graph databases

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Re-examination of interestingness measures in pattern mining: a unified framework

Data Mining and Knowledge Discovery
Efficient set-correlation operator inside databases

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Mining significant least association rules using fast SLP-growth algorithm

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Scaling up top-K cosine similarity search

Data & Knowledge Engineering
Distributed threshold querying of general functions by a difference of monotonic representation

Proceedings of the VLDB Endowment
A log-linear approach to mining significant graph-relational patterns

Data & Knowledge Engineering
Cosine interesting pattern discovery

Information Sciences: an International Journal
Exploiting clustering approaches for image re-ranking

Journal of Visual Languages and Computing
Mining maximal correlated member clusters in high dimensional database

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An FP-tree based approach for mining all strongly correlated item pairs

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
A quantitative correlation coefficient mining method for business intelligence in small and medium enterprises of trading business

Expert Systems with Applications: An International Journal
Dynamic rank correlation computing for financial risk analysis

KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
An efficient mining algorithm for maximal weighted frequent patterns in transactional databases

Knowledge-Based Systems
CGStream: continuous correlated graph query for data streams

Proceedings of the 21st ACM international conference on Information and knowledge management
Correlation range query

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Scaling up cosine interesting pattern discovery: A depth-first method

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a user-specified minimum correlation threshold θ and a market basket database with N items and T transactions, an all-strong-pairs correlation query finds all item pairs with correlations above the threshold θ. However, when the number of items and transactions are large, the computation cost of this query can be very high. In this paper, we identify an upper bound of Pearson's correlation coefficient for binary variables. This upper bound is not only much cheaper to compute than Pearson's correlation coefficient but also exhibits a special monotone property which allows pruning of many item pairs even without computing their upper bounds. A Two-step All-strong-Pairs corrElation que Ry (TAPER) algorithm is proposed to exploit these properties in a filter-and-refine manner. Furthermore, we provide an algebraic cost model which shows that the computation savings from pruning is independent or improves when the number of items is increased in data sets with common Zipf or linear rank-support distributions. Experimental results from synthetic and real data sets exhibit similar trends and show that the TAPER algorithm can be an order of magnitude faster than brute-force alternatives.