Probably almost discriminative learning

Authors:
Kenji Yamanishi
Affiliations:
C&C Information Technology Research Laboratories., NEC Corporation, 1-1 Miyazaki 4-chome, Miyamae-ku, Kawasaki, Kanagawa, Japan
Venue:
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Year:
1992

Citing 4
Cited 3

A learning criterion for stochastic rules

COLT '90 Proceedings of the third annual workshop on Computational learning theory
A loss bound model for on-line stochastic prediction strategies

COLT '91 Proceedings of the fourth annual workshop on Computational learning theory
Elements of information theory

Elements of information theory
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory

On polynomial-time probably almost discriminative learnability

COLT '93 Proceedings of the sixth annual conference on Computational learning theory
On probably correct classification of concepts

COLT '93 Proceedings of the sixth annual conference on Computational learning theory
Inference and minimization of hidden Markov chains

COLT '94 Proceedings of the seventh annual conference on Computational learning theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops a new computational model for learning stochastic rules (i.e. conditional probabilities over the set of labels for given instances) on the basis of statistical hypothesis testing theory, and derives bounds on sample complexities required for learning. The model deals with the problem of determining whether or not a given class of stochastic rules is probably almost discriminatively (PAD) learnable in the sense that one can discriminate, with low computational complexity and with high probability, between any pair of stochastic rules in that class by testing from which member of the pair a given test sequence has originated.In the proposed model, we construct new discrimination functions on the basis of the minimum description length (MDL) principle. We then derive upper bounds on the smallest training sample size and test sample size required by those discrimination functions in order to guarantee that for any pair of rules in a given class, two types of error probabilities are respectively less than &dgr;1 and &dgr;2 provided the distance between the two rules to be discriminated is not less than &egr;. As corollaries, we derive sample size bounds for PAD-learning of stochastic decision lists with at most k literals in each term and of stochastic decision trees with at most k log n depth (k is fixed). A sufficient condition for polynomial-time PAD learnability of any given class is also given in terms of the existence of an algorithm approximating the minimum description length.