Extracting Context-Sensitive Models in Inductive Logic Programming

Authors:
Ashwin Srinivasan
Affiliations:
Oxford University Computing Laboratory, Wolfson Building, Parks Road, Oxford OX1 3QD, United Kingdom. ashwin@comlab.ox.ac.uk
Venue:
Machine Learning
Year:
2001

Citing 11
Cited 7

Kendall's advanced theory of statistics

Kendall's advanced theory of statistics
Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Robust classification systems for imprecise environments

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Robust Classification for Imprecise Environments

Machine Learning
Foundations of Inductive Logic Programming

Foundations of Inductive Logic Programming
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach

Data Mining and Knowledge Discovery
FOIL: A Midterm Report

ECML '93 Proceedings of the European Conference on Machine Learning
An assessment of submissions made to the Predictive Toxicology Evaluation Challenge

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Carcinogenesis Predictions Using ILP

ILP '97 Proceedings of the 7th International Workshop on Inductive Logic Programming
The predictive toxicology evaluation challenge

IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1

Multi-relational Data Mining: a perspective

EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
An empirical study of the use of relevance information in inductive logic programming

The Journal of Machine Learning Research
Toward Inductive Logic Programming for Collaborative Problem Solving

IAT '06 Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology
Distributed interactive learning in multi-agent systems

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Collaborative inductive logic programming for path planning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Parameter Screening and Optimisation for ILP using Designed Experiments

The Journal of Machine Learning Research
ROC analysis of classifiers in machine learning: A survey

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given domain-specific background knowledge and data in the form of examples, an Inductive Logic Programming (ILP) system extracts models in the data-analytic sense. We view the model-selection step facing an ILP system as a decision problem, the solution of which requires knowledge of the context in which the model is to be deployed. In this paper, “context” will be defined by the current specification of the prior class distribution and the client's preferences concerning errors of classification. Within this restricted setting, we consider the use of an ILP system in situations where: (a) contexts can change regularly. This can arise for example, from changes to class distributions or misclassification costs; and (b) the data are from observational studies. That is, they may not have been collected with any particular context in mind. Some repercussions of these are: (a) any one model may not be the optimal choice for all contexts; and (b) not all the background information provided may be relevant for all contexts. Using results from the analysis of Receiver Operating Characteristic curves, we investigate a technique that can equip an ILP system to reject those models that cannot possibly be optimal in any context. We present empirical results from using the technique to analyse two datasets concerned with the toxicity of chemicals (in particular, their mutagenic and carcinogenic properties). Clients can, and typically do, approach such datasets with quite different requirements. For example, a synthetic chemist would require models with a low rate of commission errors which could be used to direct efficiently the synthesis of new compounds. A toxicologist on the other hand, would prefer models with a low rate of omission errors. This would enable a more complete identification of toxic chemicals at a calculated cost of misidentification of non-toxic cases as toxic. The approach adopted here attempts to obtain a solution that contains models that are optimal for each such user according to the cost function that he or she wishes to apply. In doing so, it also provides one solution to the problem of how the relevance of background predicates is to be assessed in ILP.