Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Stochastic Optimal Control: The Discrete-Time Case
Stochastic Optimal Control: The Discrete-Time Case
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators
Proceedings of the 25th international conference on Machine learning
The Journal of Machine Learning Research
Hi-index | 0.00 |
One of the important theoretical issues in reinforcement learning is to rigorously know the statistical properties of various value estimators. This study aims to theoretically examine the prediction error of the value estimator whose estimated value is represented by a linear function with respect to a parameter. We extend the framework of semiparametric statistics inference introduced by to make it applicable to analysis of mean squared error (MSE) between the true value and the predicted value. This analysis allows us to investigate and compare the statistical prediction error of value estimators when the model is misspecified, i.e., the value estimator cannot represent the true value irrelevant to the parameter. We confirm our theoretical analysis by using a toy problem.