Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Exploratory mining via constrained frequent set queries
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining association rules with multiple minimum supports
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Empirical bayes screening for multi-item associations
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining confident rules without support requirement
Proceedings of the tenth international conference on Information and knowledge management
Information Retrieval
Constraint-Based Rule Mining in Large, Dense Databases
Data Mining and Knowledge Discovery
Mining Optimized Association Rules with Categorical and Numeric Attributes
IEEE Transactions on Knowledge and Data Engineering
Mining Both Positive and Negative Association Rules
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
The Computational Complexity of High-Dimensional Correlation Search
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints
Data Mining and Knowledge Discovery
Efficient Mining of Constrained Correlated Sets
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Finding Interesting Associations without Support Pruning
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Playing hide-and-seek with correlations
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Correlation search in graph databases
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Correlated pattern mining in quantitative databases
ACM Transactions on Database Systems (TODS)
Volatile correlation computation: a checkpoint view
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Searching Correlated Objects in a Long Sequence
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Capturing truthiness: mining truth tables in binary datasets
Proceedings of the 2009 ACM symposium on Applied Computing
Models for association rules based on clustering and correlation
Intelligent Data Analysis
An association analysis approach to biclustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient set-correlation operator inside databases
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
GENCCS: a correlated group difference approach to contrast set mining
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Dynamic rank correlation computing for financial risk analysis
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
Speeding up correlation search for binary data
Pattern Recognition Letters
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.00 |
Given a user-specified minimum correlation threshold \theta and a market-basket database with N items and T transactions, an all-strong-pairs correlation query finds all item pairs with correlations above the threshold \theta. However, when the number of items and transactions are large, the computation cost of this query can be very high. The goal of this paper is to provide computationally efficient algorithms to answer the all-strong-pairs correlation query. Indeed, we identify an upper bound of Pearson's correlation coefficient for binary variables. This upper bound is not only much cheaper to compute than Pearson's correlation coefficient, but also exhibits special monotone properties which allow pruning of many item pairs even without computing their upper bounds. A Two-step All-strong-Pairs corElation queRy (TAPER) algorithm is proposed to exploit these properties in a filter-and-refine manner. Furthermore, we provide an algebraic cost model which shows that the computation savings from pruning is independent of or improves when the number of items is increased in data sets with Zipf-like or linear rank-support distributions. Experimental results from synthetic and real-world data sets exhibit similar trends and show that the TAPER algorithm can be an order of magnitude faster than brute-force alternatives. Finally, we demonstrate that the algorithmic ideas developed in the TAPER algorithm can be extended to efficiently compute negative correlation and uncentered Pearson's correlation coefficient.