The applicability to ILP of results concerning the ordering of binomial populations

  • Authors:
  • Ashwin Srinivasan

  • Affiliations:
  • Oxford University Computing Laboratory, Oxford, UK

  • Venue:
  • ILP'02 Proceedings of the 12th international conference on Inductive logic programming
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Results obtained in the 1950's on the ordering of k binomial populations are directly applicable to computing the size of data sample required to identify the best amongst k theories in an ILP search. The basic problem considered is that of selecting the best one of k (k ≥= 2) binomial populations on the basis of the observed frequency of 'success' on n observations. In the statistical formulation of the problem the experimenter specifies that he or she would like to guarantee that the probability of correct selection is at least P* (0 ≤ P* d* (0 d* ≤ 1). The principal result of interest is that for large values of k, the value of n required to meet any fixed specification (d*, P*) is approximately equal to some constant multiple of ln k. Depending on the k, d* and P* specified, this value of n can be substantially lower than that obtained within the corresponding PAC formulation. Applied to the search conducted by an ILP algorithm, the result guarantees with confidence level P*, that the true accuracy of the theory selected will be no more than d* below that of the best theory. In this paper, we review this basic result and test its predictions empirically. We also comment on the relevance to ILP of more recent work in the field.