Matrix analysis
Temporal difference learning and TD-Gammon
Communications of the ACM
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Natural gradient works efficiently in learning
Neural Computation
Analytical Mean Squared Error Curves for Temporal DifferenceLearning
Machine Learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation
Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Actor-critic algorithms
Bias and variance in value function estimation
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Estimating Functions for Blind Separation When Sources Have Variance Dependencies
The Journal of Machine Learning Research
On-line learning for very large data sets: Research Articles
Applied Stochastic Models in Business and Industry - Statistical Learning
Bias and Variance Approximation in Value Function Estimates
Management Science
The Journal of Machine Learning Research
An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators
Proceedings of the 25th international conference on Machine learning
Fast gradient-descent methods for temporal-difference learning with linear function approximation
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Incremental least-squares temporal difference learning
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A reinforcement learning approach to job-shop scheduling
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Blind source separation-semiparametric statistical approach
IEEE Transactions on Signal Processing
Hi-index | 0.00 |
Since the invention of temporal difference (TD) learning (Sutton, 1988), many new algorithms for model-free policy evaluation have been proposed. Although they have brought much progress in practical applications of reinforcement learning (RL), there still remain fundamental problems concerning statistical properties of the value function estimation. To solve these problems, we introduce a new framework, semiparametric statistical inference, to model-free policy evaluation. This framework generalizes TD learning and its extensions, and allows us to investigate statistical properties of both of batch and online learning procedures for the value function estimation in a unified way in terms of estimating functions. Furthermore, based on this framework, we derive an optimal estimating function with the minimum asymptotic variance and propose batch and online learning algorithms which achieve the optimality.