A Study of Two Sampling Methods for Analyzing Large Datasets with ILP

Authors:
Ashwin Srinivasan
Affiliations:
Oxford University Computing Laboratory, Oxford UK. ashwin@comlab.ox.ac.uk
Venue:
Data Mining and Knowledge Discovery
Year:
1999

Citing 16
Cited 23

Foundations of logic programming

Foundations of logic programming
Principles of artificial intelligence

Principles of artificial intelligence
A general lower bound on the number of examples needed for learning

COLT '88 Proceedings of the first annual workshop on Computational learning theory
Experimental comparison of human and machine learning formalisms

Proceedings of the sixth international workshop on Machine learning
Inductive logic programming

New Generation Computing - Selected papers from the international workshop on algorithmic learning theory,1990
Interactive Concept-Learning and Constructive Induction by Analogy

Machine Learning
Inductive logic programming: derivations, successes and shortcomings

ACM SIGART Bulletin
Theories for mutagenicity: a study in first-order and feature-based induction

Artificial Intelligence - Special volume on empirical methods
The Science of Programming

The Science of Programming
Learning Logical Definitions from Relations

Machine Learning
Optimal Layered Learning: A PAC Approach to Incremental Sampling

ALT '93 Proceedings of the 4th International Workshop on Algorithmic Learning Theory
Learning from Positive Data

ILP '96 Selected Papers from the 6th International Workshop on Inductive Logic Programming
An Initial Experiment into Stereochemistry-Based Drug Design Using Inductive Logic Programming

ILP '96 Selected Papers from the 6th International Workshop on Inductive Logic Programming
Part-of-Speech Tagging Using Progol

ILP '97 Proceedings of the 7th International Workshop on Inductive Logic Programming
Tractable induction and classification in first order logic via stochastic matching

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
More efficient windowing

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Four suggestions and a rule concerning the application of ILP

Relational Data Mining
Racing Committees for Large Datasets

DS '02 Proceedings of the 5th International Conference on Discovery Science
Parallel Execution for Speeding Up Inductive Logic Programming Systems

DS '99 Proceedings of the Second International Conference on Discovery Science
A Note on Two Simple Transformations for Improving the Efficiency of an ILP System

ILP '00 Proceedings of the 10th International Conference on Inductive Logic Programming
Concurrent Execution of Optimal Hypothesis Search for Inverse Entailment

ILP '00 Proceedings of the 10th International Conference on Inductive Logic Programming
Application of Pruning Techniques for Propositional Learning to Progol

ILP '01 Proceedings of the 11th International Conference on Inductive Logic Programming
An empirical study of the use of relevance information in inductive logic programming

The Journal of Machine Learning Research
Query transformations for improving the efficiency of ilp systems

The Journal of Machine Learning Research
Scalability and efficiency in multi-relational data mining

ACM SIGKDD Explorations Newsletter
Combining structural and citation-based evidence for text classification

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Efficient and Scalable Induction of Logic Programs Using a Deductive Database System

Inductive Logic Programming
Logic and the Automatic Acquisition of Scientific Knowledge: An Application to Functional Genomics

Computational Discovery of Scientific Knowledge
Compile the Hypothesis Space: Do it Once, Use it Often

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
A Restarted Strategy for Efficient Subsumption Testing

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
Parallel ILP for distributed-memory architectures

Machine Learning
Improving the efficiency of inductive logic programming through the use of query packs

Journal of Artificial Intelligence Research
Utile distinctions for relational reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
An empirical evaluation of bagging in inductive logic programming

ILP'02 Proceedings of the 12th international conference on Inductive logic programming
The applicability to ILP of results concerning the ordering of binomial populations

ILP'02 Proceedings of the 12th international conference on Inductive logic programming
Event Model Learning from Complex Videos using ILP

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A study of applying dimensionality reduction to restrict the size of a hypothesis space

ILP'05 Proceedings of the 15th international conference on Inductive Logic Programming
Compile the Hypothesis Space: Do it Once, Use it Often

Fundamenta Informaticae - Progress on Multi-Relational Data Mining
A Restarted Strategy for Efficient Subsumption Testing

Fundamenta Informaticae - Progress on Multi-Relational Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is concerned with problems that arise whensubmitting large quantities of data to analysis by an Inductive LogicProgramming (ILP) system. Complexity arguments usually make itprohibitive to analyse such datasets in their entirety. We examinetwo schemes that allow an ILP system to construct theories bysampling from this large pool of data. The first, “subsampling”,is a single-sample design in which the utility of a potential rule isevaluated on a randomly selected sub-sample of the data. The second,“logical windowing”, is multiple-sample design that tests andsequentially includes errors made by a partially correct theory. Bothschemes are derived from techniques developed to enable propositionallearning methods (like decision trees) to cope with large datasets.The ILP system CProgol, equipped with each of these methods, is usedto construct theories for two datasets—one artificial (a chessendgame) and the other naturally occurring (a language taggingproblem). In each case, we ask the following questions of CProgolequipped with sampling: (1) Is its theory comparable in predictiveaccuracy to that obtained if all the data were used (that is, nosampling was employed)?; and (2) Is its theory constructed in lesstime than the one obtained with all the data? For the problemsconsidered, the answers to these questions is “yes”. This suggeststhat an ILP program equipped with an appropriate sampling methodcould begin to address problems satisfactorily that have hithertobeen inaccessible simply due to data extent.