Generalized TD Learning

Authors:
Tsuyoshi Ueno;Shin-ichi Maeda;Motoaki Kawanabe;Shin Ishii
Affiliations:
-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2011

Citing 22
Cited 1

Matrix analysis

Matrix analysis
Temporal difference learning and TD-Gammon

Communications of the ACM
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Natural gradient works efficiently in learning

Neural Computation
Analytical Mean Squared Error Curves for Temporal DifferenceLearning

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Actor-critic algorithms

Actor-critic algorithms
Bias and variance in value function estimation

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Estimating Functions for Blind Separation When Sources Have Variance Dependencies

The Journal of Machine Learning Research
On-line learning for very large data sets: Research Articles

Applied Stochastic Models in Business and Industry - Statistical Learning
Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons

Neural Computation
Bias and Variance Approximation in Value Function Estimates

Management Science
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

The Journal of Machine Learning Research
An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators

Proceedings of the 25th international conference on Machine learning
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Blind source separation-semiparametric statistical approach

IEEE Transactions on Signal Processing

Letter to the editor: Asymptotic analysis of value prediction by well-specified and misspecified models

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since the invention of temporal difference (TD) learning (Sutton, 1988), many new algorithms for model-free policy evaluation have been proposed. Although they have brought much progress in practical applications of reinforcement learning (RL), there still remain fundamental problems concerning statistical properties of the value function estimation. To solve these problems, we introduce a new framework, semiparametric statistical inference, to model-free policy evaluation. This framework generalizes TD learning and its extensions, and allows us to investigate statistical properties of both of batch and online learning procedures for the value function estimation in a unified way in terms of estimating functions. Furthermore, based on this framework, we derive an optimal estimating function with the minimum asymptotic variance and propose batch and online learning algorithms which achieve the optimality.