Efficient Search Methods for Statistical Dependency Rules

Authors:
Wilhelmiina Hämäläinen
Affiliations:
(Correspd.) (Emil Aaltonen Fund (Emil Alltosen säätiö) has supported this research with a personal grant) Department of Biosciences, University of Eastern Finland, Kuopio, Finland. ...
Venue:
Fundamenta Informaticae - Machine Learning in Bioinformatics
Year:
2011

Citing 18
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A new framework for itemset generation

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Transversing itemset lattices with statistical metric pruning

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Parallel Branch-and-Bound Graph Search for Correlated Association Rules

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
K-Optimal Rule Discovery

Data Mining and Knowledge Discovery
Mining positive and negative association rules: an approach for confined rules

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Using Information-Theoretic Measures to Assess Association Rule Interestingness

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
On Optimal Rule Discovery

IEEE Transactions on Knowledge and Data Engineering
Discovering significant rules

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering Significant Patterns

Machine Learning
Efficient Discovery of Statistically Significant Association Rules

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Correlated itemset mining in ROC space: a constraint programming approach

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Lift-based search for significant dependencies in dense data sets

Proceedings of the KDD-09 Workshop on Statistical and Relational Learning in Bioinformatics
Bioinformatics challenges for genome-wide association studies

Bioinformatics
Efficiently finding negative association rules without support threshold

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
StatApriori: an efficient algorithm for searching statistically significant association rules

Knowledge and Information Systems
Multi-class correlated pattern mining

KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases

Finding Meaningful Bayesian Confirmation Measures

Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dependency analysis is one of the central problems in bioinformatics and all empirical science. In genetics, for example, an important problem is to find which gene alleles are mutually dependent or which alleles and diseases are dependent. In ecology, a similar problem is to find dependencies between different species or groups of species. In both cases a classical solution is to consider all pairwise dependencies between single attributes and evaluate the relationships with some statistical measure like the χ 2-measure. It is known that the actual dependency structures can involve more attributes, but the existing computational methods are too inefficient for such an exhaustive search. In this paper, we introduce efficient search methods for positive dependencies of the form X → A with typical statistical measures. The efficiency is achieved by a special kind of a branch-and-bound search which also prunes out redundant rules. Redundant attributes are especially harmful in dependency analysis, because they can blur the actual dependencies and even lead to erroneous conclusions. We consider two alternative definitions of redundancy: the classical one and a stricter one. We improve our previous algorithm for searching for the best strictly non-redundant dependency rules and introduce a totally new algorithm for searching for the best classically non-redundant rules. According to our experiments, both algorithms can prune the search space very efficiently, and in practice no minimum frequency thresholds are needed. This is an important benefit, because biological data sets are typically dense, and the alternative search methods would require too large minimum frequency thresholds for any practical purpose.