Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Exploratory mining via constrained frequent set queries
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Empirical bayes screening for multi-item associations
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Constraint-Based Rule Mining in Large, Dense Databases
Data Mining and Knowledge Discovery
Mining Optimized Association Rules with Categorical and Numeric Attributes
IEEE Transactions on Knowledge and Data Engineering
MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases
Proceedings of the 17th International Conference on Data Engineering
The Computational Complexity of High-Dimensional Correlation Search
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
DualMiner: a dual-pruning algorithm for itemsets with constraints
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Playing hide-and-seek with correlations
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Linear correlation discovery in databases: a data mining approach
Data & Knowledge Engineering
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases
IEEE Transactions on Knowledge and Data Engineering
Assessing data mining results via swap randomization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding highly correlated pairs efficiently with powerful pruning
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Mining top-k strongly correlated item pairs without minimum correlation threshold
International Journal of Knowledge-based and Intelligent Engineering Systems
Efficient mining of weighted interesting patterns with a strong weight and/or support affinity
Information Sciences: an International Journal
Assessing data mining results via swap randomization
ACM Transactions on Knowledge Discovery from Data (TKDD)
Detecting attribute dependencies from query feedback
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Volatile correlation computation: a checkpoint view
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Association Mining in Large Databases: A Re-examination of Its Measures
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Mining non-redundant high order correlations in binary data
Proceedings of the VLDB Endowment
An efficient mining of weighted frequent patterns with length decreasing support constraints
Knowledge-Based Systems
Top-K Correlation Sub-graph Search in Graph Databases
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
On pushing weight constraints deeply into frequent itemset mining
Intelligent Data Analysis
Association rules mining including weak-support modes using novel measures
WSEAS Transactions on Computers
Tight correlated item sets and their efficient discovery
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Mining correlated subgraphs in graph databases
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Re-examination of interestingness measures in pattern mining: a unified framework
Data Mining and Knowledge Discovery
Efficient set-correlation operator inside databases
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Mining significant least association rules using fast SLP-growth algorithm
AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Scaling up top-K cosine similarity search
Data & Knowledge Engineering
Distributed threshold querying of general functions by a difference of monotonic representation
Proceedings of the VLDB Endowment
A log-linear approach to mining significant graph-relational patterns
Data & Knowledge Engineering
Cosine interesting pattern discovery
Information Sciences: an International Journal
Exploiting clustering approaches for image re-ranking
Journal of Visual Languages and Computing
Mining maximal correlated member clusters in high dimensional database
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An FP-tree based approach for mining all strongly correlated item pairs
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Expert Systems with Applications: An International Journal
Dynamic rank correlation computing for financial risk analysis
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
An efficient mining algorithm for maximal weighted frequent patterns in transactional databases
Knowledge-Based Systems
CGStream: continuous correlated graph query for data streams
Proceedings of the 21st ACM international conference on Information and knowledge management
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Scaling up cosine interesting pattern discovery: A depth-first method
Information Sciences: an International Journal
Hi-index | 0.00 |
Given a user-specified minimum correlation threshold θ and a market basket database with N items and T transactions, an all-strong-pairs correlation query finds all item pairs with correlations above the threshold θ. However, when the number of items and transactions are large, the computation cost of this query can be very high. In this paper, we identify an upper bound of Pearson's correlation coefficient for binary variables. This upper bound is not only much cheaper to compute than Pearson's correlation coefficient but also exhibits a special monotone property which allows pruning of many item pairs even without computing their upper bounds. A Two-step All-strong-Pairs corrElation que Ry (TAPER) algorithm is proposed to exploit these properties in a filter-and-refine manner. Furthermore, we provide an algebraic cost model which shows that the computation savings from pruning is independent or improves when the number of items is increased in data sets with common Zipf or linear rank-support distributions. Experimental results from synthetic and real data sets exhibit similar trends and show that the TAPER algorithm can be an order of magnitude faster than brute-force alternatives.