Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes

  • Authors:
  • Ashwin Srinivasan;Ross D. King

  • Affiliations:
  • Oxford University Computing Laboratory, Oxford UK. ashwin@comlab.ox.ac.uk;Department of Computer Science, University of Wales, Aberystwyth, Wales UK. rdk@aber.ac.uk

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, computer programs developed within the field of InductiveLogic Programming (ILP) have received some attention for their abilityto construct restricted first-order logic solutions using problem-specificbackground knowledge. Prominent applications of such programs have beenconcerned with determining “structure-activity” relationships inthe areas of molecular biology and chemistry. Typically the task hereis to predict the “activity” of a compound (for example, toxicity), from itschemical structure. A summary of the research in the area is:(a) ILP programs have largely been restricted to qualitative predictionsof activity (“high”, “low” etc.);(b) When appropriate attributes are available, ILP programs have equivalentpredictivity to standard quantitative analysistechniques like linear regression. However ILP programs usually perform betterwhen such attributes are unavailable; and (c) By using structural informationas background knowledge, an ILP program can provide comprehensible explanationsfor biological activity.This paper examines the use of ILP programsas a method of “discovering” new attributes.These attributes could then be used by methods like linear regression,thus allowing for quantitative predictionswhile retaining the ability to use structural information as backgroundknowledge. Using structure-activity tasks as a test-bed, the utility of ILPprograms in constructing new features was evaluated by examining the predictionof biological activity using linear regression, with and without the aid of ILPlearnt logical attributes. In three out of the five data sets examined the additionof ILP attributes produced statistically better results. In additionsix important structural features that have escaped the attention of the expertchemists were discovered. The method used here to construct new attributesis not specific to the problem of predicting biological activity, andthe results obtained suggest a wider role for ILP programs in aidingthe process of scientific discovery.