A new approach to the maximum-flow problem
Journal of the ACM (JACM)
Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficiently mining long patterns from databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximating clique and biclique problems
Journal of Algorithms
Mining the most interesting rules
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient search for association rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding the most interesting correlations in a database: how hard can it be?
Information Systems
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
TAPER: A Two-Step Approach for All-Strong-Pairs Correlation Query in Large Databases
IEEE Transactions on Knowledge and Data Engineering
Volatile correlation computation: a checkpoint view
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic rank correlation computing for financial risk analysis
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
Hi-index | 0.00 |
We present a method for very high-dimensional correlation analysis. The method relies equally on rigorous search strategies and on human interaction. At each step, the method conservatively "shaves off" a fraction of the database tuples and attributes, so that most of the correlations present in the data are not affected by the decomposition. Instead, the correlations become more obvious to the user, because they are hidden in a much smaller portion of the database. This process can be repeated iteratively and interactively, until only the most important correlations remain.The main technical difficulty of the approach is figuring out how to "shave off" part of the database so as to preserve most correlations. We develop an algorithm for this problem that has a polynomial running time and guarantees result quality.