The NP-completeness column: An ongoing guide
Journal of Algorithms
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Approximation algorithms for NP-hard problems
Approximation algorithms for NP-hard problems
Approximation algorithms for NP-hard problems
Beyond market baskets: generalizing association rules to correlations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximating clique and biclique problems
Journal of Algorithms
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining the most interesting rules
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Small is beautiful: discovering the minimal set of unexpected patterns
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient search for association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient computation of Iceberg cubes with complex measures
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Relations between average case complexity and approximation complexity
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
On the Complexity of Mining Quantitative Association Rules
Data Mining and Knowledge Discovery
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Discovering All Most Specific Sentences by Randomized Algorithms
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Mining Surprising Patterns Using Temporal Description Length
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Explaining Differences in Multidimensional Aggregates
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Optimized Support Rules for Numeric Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Finding Interesting Associations without Support Pruning
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics - Special issue on using large corpora: I
Playing hide-and-seek with correlations
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining top-k frequent closed itemsets is not in APX
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Hi-index | 0.00 |
This paper addresses some of the foundational issues associated with discovering the best few correlations from a database. Specifically, we consider the computational complexity of various definitions of the "top-k correlation problem," where the goal is to discover the few sets of events whose co-occurrence exhibits the smallest degree of independence. Our results show that many rigorous definitions of correlation lead to intractable and strongly inapproximable problems. Proof of this inapproximability is significant, since similar problems studied by the computer science theory community have resisted such analysis. One goal of the paper (and for future research) is to develop alternative correlation metrics whose use will both allow efficient search and produce results that are satisfactory for users.