Improving the efficiency of inductive logic programming systems

  • Authors:
  • Nuno A. Fonseca;Vítor Santos Costa;Ricardo Rocha;Rui Camacho;Fernando Silva

  • Affiliations:
  • Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, Rua do Campo Alegre, 823, 4169-007 Porto, Portugal;CRACS & Faculdade de Ciêências, Universidade do Porto, Rua do Campo Alegre, 1021-1055, 4169-007 Porto, Portugal;CRACS & Faculdade de Ciêências, Universidade do Porto, Rua do Campo Alegre, 1021-1055, 4169-007 Porto, Portugal;LIAAD & Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, s-n, 4200-465 Porto, Portugal;CRACS & Faculdade de Ciêências, Universidade do Porto, Rua do Campo Alegre, 1021-1055, 4169-007 Porto, Portugal

  • Venue:
  • Software—Practice & Experience
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Inductive logic programming (ILP) is a sub-field of machine learning that provides an excellent framework for multi-relational data mining applications. The advantages of ILP have been successfully demonstrated in complex and relevant industrial and scientific problems. However, to produce valuable models, ILP systems often require long running times and large amounts of memory. In this paper we address fundamental issues that have direct impact on the efficiency of ILP systems. Namely, we discuss how improvements in the indexing mechanisms of an underlying logic programming system benefit ILP performance. Furthermore, we propose novel data structures to reduce memory requirements and we suggest a new lazy evaluation technique to search the hypothesis space more efficiently. These proposals have been implemented in the April ILP system and evaluated using several well-known data sets. The results observed show significant improvements in running time without compromising the accuracy of the models generated. Indeed, the combined techniques achieve several order of magnitudes speedup in some data sets. Moreover, memory requirements are reduced in nearly half of the data sets. Copyright © 2008 John Wiley & Sons, Ltd.