A numerical refinement operator based on multi-instance learning

  • Authors:
  • Erick Alphonse;Tobias Girschick;Fabian Buchwald;Stefan Kramer

  • Affiliations:
  • Laboratoire d'Informatique de l'université Paris-Nord, Villetaneuse, France;Technische Universität München, Institut für Informatik, Garching b. München, Germany;Technische Universität München, Institut für Informatik, Garching b. München, Germany;Technische Universität München, Institut für Informatik, Garching b. München, Germany

  • Venue:
  • ILP'10 Proceedings of the 20th international conference on Inductive logic programming
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a numerical refinement operator based on multiinstance learning. In the approach, the task of handling numerical variables in a clause is delegated to statistical multi-instance learning schemes. To each clause, there is an associated multi-instance classification model with the numerical variables of the clause as input. Clauses are built in a greedy manner, where each refinement adds new numerical variables which are used additionally to the numerical variables already known to the multi-instance model. In our experiments, we tested this approach with multi-instance learners available in the Weka workbench (like MISVMs). These clauses are used in a boosting approach that can take advantage of the margin information, going beyond standard covering procedures or the discrete boosting of rules, like in SLIPPER. The approach is evaluated on the problem of hexose binding site prediction, a pharmacological application and mutagenicity prediction. In two of the three applications, the task is to find configurations of points with certain properties in 3D space that characterize either a binding site or drug activity: the logical part of the clause constitutes the points with their properties, whereas the multi-instance model constrains the distances among the points. In summary, the new numerical refinement operator is interesting both theoretically as a new synthesis of logical and statistical learning and practically as a new method for characterizing binding sites and pharmacophores in biochemical applications.