Communications of the ACM
Inductive logic programming with large-scale unstructured data
Machine intelligence 14
New development of optimal computing budget allocation for discrete event simulation
Proceedings of the 29th conference on Winter simulation
Selecting and ordering populations: a new statistical methodology
Selecting and ordering populations: a new statistical methodology
Machine Learning
A Study of Two Sampling Methods for Analyzing Large Datasets with ILP
Data Mining and Knowledge Discovery
Finding the most interesting patterns in a database quickly by using sequential sampling
The Journal of Machine Learning Research
Tractable induction and classification in first order logic via stochastic matching
IJCAI'97 Proceedings of the Fifteenth international joint conference on Artifical intelligence - Volume 2
Hi-index | 0.00 |
Results obtained in the 1950's on the ordering of k binomial populations are directly applicable to computing the size of data sample required to identify the best amongst k theories in an ILP search. The basic problem considered is that of selecting the best one of k (k ≥= 2) binomial populations on the basis of the observed frequency of 'success' on n observations. In the statistical formulation of the problem the experimenter specifies that he or she would like to guarantee that the probability of correct selection is at least P* (0 ≤ P* d* (0 d* ≤ 1). The principal result of interest is that for large values of k, the value of n required to meet any fixed specification (d*, P*) is approximately equal to some constant multiple of ln k. Depending on the k, d* and P* specified, this value of n can be substantially lower than that obtained within the corresponding PAC formulation. Applied to the search conducted by an ILP algorithm, the result guarantees with confidence level P*, that the true accuracy of the theory selected will be no more than d* below that of the best theory. In this paper, we review this basic result and test its predictions empirically. We also comment on the relevance to ILP of more recent work in the field.