Compile the Hypothesis Space: Do it Once, Use it Often

  • Authors:
  • Nuno A. Fonseca;Rui Camacho;Ricardo Rocha;Ví/tor Santos Costa

  • Affiliations:
  • (Correspd.) Instituto de Biologia Molecular e Celular (IBMC) & CRACS, University of Porto, Portugal. nf@ibmc.up.pt;FEUP & LIAAD, University of Porto, Portugal. rcamacho@fe.up.pt;DCC-FCUP & CRACS, University of Porto, Portugal. ricroc@dcc.fc.up.pt/ vsc@dcc.fc.up.pt;DCC-FCUP & CRACS, University of Porto, Portugal. ricroc@dcc.fc.up.pt/ vsc@dcc.fc.up.pt

  • Venue:
  • Fundamenta Informaticae - Progress on Multi-Relational Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Inductive Logic Programming (ILP) is a powerful and well-developed abstraction for multi-relational data mining techniques. Despite the considerable success of ILP, deployed ILP systems still have efficiency problems when applied to complex problems. In this paper we propose a novel technique that avoids the procedure of deducing each example to evaluate each constructed clause. The technique is based on the Mode Directed Inverse Entailment approach to ILP, where a bottom clause is generated for each example and the generated clauses are subsets of the literals of such bottom clause. We propose to store in a prefix-tree all clauses that can be generated from all bottom clauses together with some extra information. We show that this information is sufficient to estimate the number of examples that can be deduced froma clause and present an ILP algorithmthat exploits this representation. We also present an extension of the algorithm where each prefix-tree is computed only once (compiled) per example. The evaluation of hypotheses requires only basic and efficient operations on trees. This proposal avoids re-computation of hypothesis' value in theorylevel search, in cross-validation evaluation procedures and in parameter tuning. Both proposals are empirically evaluated on real applications and considerable speedups were observed.