A decision-theoretic extension of stochastic complexity and its applications to learning

Authors:
K. Yamanishi
Affiliations:
NEC Res. Inst., Princeton, NJ
Venue:
IEEE Transactions on Information Theory
Year:
1998

Citing 0
Cited 38

Predicting a binary sequence almost as well as the optimal biased coin

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
A randomized approximation of the MDL for stochastic models with hidden variables

COLT '96 Proceedings of the ninth annual conference on Computational learning theory
Statistical inference, Occam's razor, and statistical mechanics on the space of probability distributions

Neural Computation
Analysis of two gradient-based algorithms for on-line regression

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Distributed cooperative Bayesian learning strategies

COLT '97 Proceedings of the tenth annual conference on Computational learning theory
Minimax relative loss analysis for sequential prediction algorithms using parametric hypotheses

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
The robustness of the p-norm algorithms

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Minimax regret under log loss for general classes of experts

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Viewing all models as “probabilistic”

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Text classification using ESC-based stochastic decision lists

Proceedings of the eighth international conference on Information and knowledge management
Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining from open answers in questionnaire data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

Machine Learning
Worst-Case Bounds for the Logarithmic Loss of Predictors

Machine Learning
Theoretical and Experimental Evaluation of the Subspace Information Criterion

Machine Learning
Algebraic geometrical methods for hierarchical learning machines

Neural Networks
Mining Open Answers in Questionnaire Data

IEEE Intelligent Systems
Text classification using ESC-based stochastic decision lists

Information Processing and Management: an International Journal
Extended Stochastic Complexity and Minimax Relative Loss Analysis

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
The Last-Step Minimax Algorithm

ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
On-Line Estimation of Hidden Markov Model Parameters

DS '00 Proceedings of the Third International Conference on Discovery Science
Mining product reputations on the Web

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Predicting a binary sequence almost as well as the optimal biased coin

Information and Computation
The subspace information criterion for infinite dimensional hypothesis spaces

The Journal of Machine Learning Research
The Robustness of the p-Norm Algorithms

Machine Learning
Optimality of universal Bayesian sequence prediction for general loss and alphabet

The Journal of Machine Learning Research
Singularities in mixture models and upper bounds of stochastic complexity

Neural Networks
Tracking dynamics of topic trends using a finite mixture model

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Key semantics extraction by dependency tree mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Subspace Information Criterion for Model Selection

Neural Computation
Algebraic Analysis for Nonidentifiable Learning Machines

Neural Computation
A lower bound on compression of unknown alphabets

Theoretical Computer Science
Suboptimal behavior of Bayes and MDL in classification under misspecification

Machine Learning
Asymptotic analysis of Bayesian generalization error with Newton diagram

Neural Networks
Algebraic geometry and stochastic complexity of hidden Markov models

Neurocomputing
Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation

The Journal of Machine Learning Research
Relative loss bounds for on-line density estimation with the exponential family of distributions

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Learning coefficient of generalization error in bayesian estimation and vandermonde matrix-type singularity

Neural Computation

Quantified Score

Hi-index	754.84

Visualization

Abstract

Rissanen (1978) has introduced stochastic complexity to define the amount of information in a given data sequence relative to a given hypothesis class of probability densities, where the information is measured in terms of the logarithmic loss associated with universal data compression. This paper introduces the notion of extended stochastic complexity (ESC) and demonstrates its effectiveness in design and analysis of learning algorithms in on-line prediction and batch-learning scenarios. ESC can be thought of as an extension of Rissanen's stochastic complexity to the decision-theoretic setting where a general real-valued function is used as a hypothesis and a general loss function is used as a distortion measure. As an application of ESC to on-line prediction, this paper shows that a sequential realization of ESC produces an on-line prediction algorithm called Vovk's aggregating strategy, which can be thought of as an extension of the Bayes algorithm. We derive upper bounds on the cumulative loss for the aggregating strategy both of an expected form and a worst case form in the case where the hypothesis class is continuous. As an application of ESC to batch-learning, this paper shows that a batch-approximation of ESC induces a batch-learning algorithm called the minimum L-complexity algorithm (MLC), which is an extension of the minimum description length (MDL) principle. We derive upper bounds on the statistical risk for the MLC, which are the least to date. Through the ESC we give a unifying view of the most effective learning algorithms that have been explored in computational learning theory