Probably Almost Discriminative Learning

  • Authors:
  • Kenji Yamanishi

  • Affiliations:
  • NEC Research Institute, Inc., 4 Independence Way, Princeton NJ 08540. yamanisi@research.nj.nec.com

  • Venue:
  • Machine Learning
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper develops a new computational model for learning stochastic rules, called PAD (Probably Almost Discriminative)-learning model, based on statistical hypothesis testing theory. The model deals with the problem of designing a discrimination algorithm to test whether or not any given test sequence of examples of pairs of (instance, label) has come from a given stochastic rule P*. Here a composite hypothesis {\tilde P} is unknown other than it belongs to a given class {\cal C}.In this model, we propose a new discrimination algorithm on the basis of the MDL (Minimum Description Length) principle, and then derive upper bounds on the least test sample size required by the algorithm to guarantee that two types of error probabilities are respectively less than δ1 and δ2 provided that the distance between the two rules to be discriminated is not less than ϵ.For the parametric case where {\cal C} is a parametric class, this paper shows that an upper bound on test sample size is given by O({1\over \varepsilon}\ {\rm ln}\ {1\over\delta _1}\ +\ {1\over \varepsilon ^2}\ {\rm ln}{1\over \delta _2}\ +\ {\tilde{k}\over \varepsilon}\ {\rm ln}\ {\tilde{k}\over \varepsilon}\ +\ {\ell({\tilde M})\over \varepsilon}). Here {\tilde k} is the number of real-valued parameters for the composite hypothesis {\tilde P,} and \ell({\tilde M}) is the description length for the countable model for {\tilde P.} Further this paper shows that the MDL-based discrimination algorithm performs well in the sense of sample complexity efficiency, comparing it with other kinds of information-criteria-based discrimination algorithms. This paper also shows how to transform any stochastic PAC (Probably Approximately Correct)-learning algorithm into a PAD-learning algorithm.For the non-parametric case where {\cal C} is a non-parametric class but the discrimination algorithm uses a parametric class, this paper demonstrates that the sample complexity bound for the MDL-based discrimination algorithm is essentially related to Barron and Cover's index of resolvability. The sample complexity bound gives a new view at the relationship between the index of resolvability and the MDL principle from the PAD-learning viewpoint.