Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A new framework for itemset generation
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Transversing itemset lattices with statistical metric pruning
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Parallel Branch-and-Bound Graph Search for Correlated Association Rules
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Data Mining and Knowledge Discovery
Mining positive and negative association rules: an approach for confined rules
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Using Information-Theoretic Measures to Assess Association Rule Interestingness
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Significant Patterns
Machine Learning
Efficient Discovery of Statistically Significant Association Rules
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Correlated itemset mining in ROC space: a constraint programming approach
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Lift-based search for significant dependencies in dense data sets
Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
Efficiently finding negative association rules without support threshold
AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
StatApriori: an efficient algorithm for searching statistically significant association rules
Knowledge and Information Systems
Multi-class correlated pattern mining
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Finding Meaningful Bayesian Confirmation Measures
Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday
Hi-index | 0.00 |
Dependency analysis is one of the central problems in bioinformatics and all empirical science. In genetics, for example, an important problem is to find which gene alleles are mutually dependent or which alleles and diseases are dependent. In ecology, a similar problem is to find dependencies between different species or groups of species. In both cases a classical solution is to consider all pairwise dependencies between single attributes and evaluate the relationships with some statistical measure like the χ 2-measure. It is known that the actual dependency structures can involve more attributes, but the existing computational methods are too inefficient for such an exhaustive search. In this paper, we introduce efficient search methods for positive dependencies of the form X → A with typical statistical measures. The efficiency is achieved by a special kind of a branch-and-bound search which also prunes out redundant rules. Redundant attributes are especially harmful in dependency analysis, because they can blur the actual dependencies and even lead to erroneous conclusions. We consider two alternative definitions of redundancy: the classical one and a stricter one. We improve our previous algorithm for searching for the best strictly non-redundant dependency rules and introduce a totally new algorithm for searching for the best classically non-redundant rules. According to our experiments, both algorithms can prune the search space very efficiently, and in practice no minimum frequency thresholds are needed. This is an important benefit, because biological data sets are typically dense, and the alternative search methods would require too large minimum frequency thresholds for any practical purpose.