Probably Almost Discriminative Learning

Authors:
Kenji Yamanishi
Affiliations:
NEC Research Institute, Inc., 4 Independence Way, Princeton NJ 08540. yamanisi@research.nj.nec.com
Venue:
Machine Learning
Year:
1995

Citing 0
Cited 6

Randomized approximate aggregating strategies and their applications to prediction and discrimination

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
A randomized approximation of the MDL for stochastic models with hidden variables

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
Testing problems with sub-learning sample complexity

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Property testing and its connection to learning and approximation

Journal of the ACM (JACM)
Property Testing: A Learning Theory Perspective

Foundations and Trends® in Machine Learning
Testing Closeness of Discrete Distributions

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper develops a new computational model for learning stochastic rules, called PAD (Probably Almost Discriminative)-learning model, based on statistical hypothesis testing theory. The model deals with the problem of designing a discrimination algorithm to test whether or not any given test sequence of examples of pairs of (instance, label) has come from a given stochastic rule P*. Here a composite hypothesis {\tilde P} is unknown other than it belongs to a given class {\cal C}.In this model, we propose a new discrimination algorithm on the basis of the MDL (Minimum Description Length) principle, and then derive upper bounds on the least test sample size required by the algorithm to guarantee that two types of error probabilities are respectively less than δ1 and δ2 provided that the distance between the two rules to be discriminated is not less than ϵ.For the parametric case where {\cal C} is a parametric class, this paper shows that an upper bound on test sample size is given by O({1\over \varepsilon}\ {\rm ln}\ {1\over\delta _1}\ +\ {1\over \varepsilon ^2}\ {\rm ln}{1\over \delta _2}\ +\ {\tilde{k}\over \varepsilon}\ {\rm ln}\ {\tilde{k}\over \varepsilon}\ +\ {\ell({\tilde M})\over \varepsilon}). Here {\tilde k} is the number of real-valued parameters for the composite hypothesis {\tilde P,} and \ell({\tilde M}) is the description length for the countable model for {\tilde P.} Further this paper shows that the MDL-based discrimination algorithm performs well in the sense of sample complexity efficiency, comparing it with other kinds of information-criteria-based discrimination algorithms. This paper also shows how to transform any stochastic PAC (Probably Approximately Correct)-learning algorithm into a PAD-learning algorithm.For the non-parametric case where {\cal C} is a non-parametric class but the discrimination algorithm uses a parametric class, this paper demonstrates that the sample complexity bound for the MDL-based discrimination algorithm is essentially related to Barron and Cover's index of resolvability. The sample complexity bound gives a new view at the relationship between the index of resolvability and the MDL principle from the PAD-learning viewpoint.