The applicability to ILP of results concerning the ordering of binomial populations

Authors:
Ashwin Srinivasan
Affiliations:
Oxford University Computing Laboratory, Oxford, UK
Venue:
ILP'02 Proceedings of the 12th international conference on Inductive logic programming
Year:
2002

Citing 8
Cited 0

A theory of the learnable

Communications of the ACM
Inductive logic programming with large-scale unstructured data

Machine intelligence 14
New development of optimal computing budget allocation for discrete event simulation

Proceedings of the 29th conference on Winter simulation
Selecting and ordering populations: a new statistical methodology

Selecting and ordering populations: a new statistical methodology
Machine Learning

Machine Learning
A Study of Two Sampling Methods for Analyzing Large Datasets with ILP

Data Mining and Knowledge Discovery
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
Tractable induction and classification in first order logic via stochastic matching

IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Results obtained in the 1950's on the ordering of k binomial populations are directly applicable to computing the size of data sample required to identify the best amongst k theories in an ILP search. The basic problem considered is that of selecting the best one of k (k ≥= 2) binomial populations on the basis of the observed frequency of 'success' on n observations. In the statistical formulation of the problem the experimenter specifies that he or she would like to guarantee that the probability of correct selection is at least P* (0 ≤ P* d* (0 d* ≤ 1). The principal result of interest is that for large values of k, the value of n required to meet any fixed specification (d*, P*) is approximately equal to some constant multiple of ln k. Depending on the k, d* and P* specified, this value of n can be substantially lower than that obtained within the corresponding PAC formulation. Applied to the search conducted by an ILP algorithm, the result guarantees with confidence level P*, that the true accuracy of the theory selected will be no more than d* below that of the best theory. In this paper, we review this basic result and test its predictions empirically. We also comment on the relevance to ILP of more recent work in the field.