Relative Loss Bounds for Temporal-Difference Learning

Authors:
Jürgen Forster;Manfred K. Warmuth
Affiliations:
Lehrstuhl Mathematik & Informatik, Fakultät für Mathematik, Ruhr-Universität Bochum, 44780 Bochum, Germany. forster@lmi.ruhr-uni-bochum.de;Computer Science Department, University of California, Santa Cruz, CA 95064, USA. manfred@cse.ucsc.edu
Venue:
Machine Learning
Year:
2003

Citing 13
Cited 0

Adaptive signal processing

Adaptive signal processing
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
On the worst-case analysis of temporal-difference learning algorithms

Machine Learning - Special issue on reinforcement learning
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Tracking the best regressor

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Numerical Recipes in Pascal: The Art of Scientific Computing

Numerical Recipes in Pascal: The Art of Scientific Computing
Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions

Machine Learning
An Introduction to Wavelets

IEEE Computational Science & Engineering
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Ridge Regression Learning Algorithm in Dual Variables

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Worst-case quadratic loss bounds for prediction using linear functions and gradient descent

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

Foster and Vovk proved relative loss bounds for linear regression where the total loss of the on-line algorithm minus the total loss of the best linear predictor (chosen in hindsight) grows logarithmically with the number of trials. We give similar bounds for temporal-difference learning. Learning takes place in a sequence of trials where the learner tries to predict discounted sums of future reinforcement signals. The quality of the predictions is measured with the square loss and we bound the total loss of the on-line algorithm minus the total loss of the best linear predictor for the whole sequence of trials. Again the difference of the losses is logarithmic in the number of trials. The bounds hold for an arbitrary (worst-case) sequence of examples. We also give a bound on the expected difference for the case when the instances are chosen from an unknown distribution. For linear regression a corresponding lower bound shows that this expected bound cannot be improved substantially.