Experimental investigation of pruning methods for relational pattern discovery

Authors:
Irene Weber
Affiliations:
Institut für Informatik, Universität Stuttgart, Stuttgart, Germany
Venue:
ILP'02 Proceedings of the 12th international conference on Inductive logic programming
Year:
2002

Citing 4
Cited 0

Fast discovery of association rules

Advances in knowledge discovery and data mining
Levelwise Search and Pruning Strategies for First-Order HypothesisSpaces

Journal of Intelligent Information Systems - Special issue on methodologies for intelligent information systems
Confirmation-guided discovery of first-order rules with tertius

Machine Learning
An Algorithm for Multi-relational Discovery of Subgroups

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding all interesting patterns in a database is a data mining task that typically requires a complete search through the hypothesis space. Several ILP systems address this task, e.g., [Deh98, Wro97, FL01]. Safe pruning techniques that reduce the size of the hypothesis space without the risk of missing interesting patterns are very important for this task. This paper is concerned with the effectiveness of pruning techniques in this setting. The addressed pruning techniques are (1) optimum estimates, (2) a pruning technique based on subset tests that is derived from the Apriori search algorithm, (3) pruning based on taxonomies, and (4) to consider only most general patterns as interesting. Methods (1) to (3) are safe pruning techniques that find all interesting patterns; method (4) reduces the number of accepted patterns. The effect of these pruning methods is investigated by experiments within a range of different specific task settings and two databases. Experimental results indicate that optimum estimates and Apriori-style pruning are effective and reliable pruning techniques that produce little additional cost. The effect of taxonomies for pruning is smaller, and it varies over different task settings. In the experiments, the restriction to most general patterns considerably reduces the search costs as well as the set of accepted patterns.