Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Investigating the Maximum Likelihood Alternative to TD(lambda)
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Computing factored value functions for policies in structured MDPs
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
A popular form of policy evaluation for large Markov Decision Processes (MDPs) is the least-squares temporal differencing( TD) method. Least-squares TD methods handle large MDPs by requiring prior knowledge feature vectors which form a set of basis vectors that compress the system down to tractable levels. Model-based methods have largely been ignored in favour of model-free TD algorithms due to two perceived drawbacks: slower computation time and larger storage requirements. This paper challenges the perceived advantage of the temporal difference method over a model-based method in three distinct ways. First, it provides a new model-based approximate policy estimation method which produces solutions in a faster computation time than Boyan's least-squares TD method. Second, it introduces a new algorithm to derive basis vectors without any prior knowledge of the system. Third, we introduce an iteratively improving model-based value estimator that can run faster than standard TD methods. All algorithms require model storage but remain computationally competitive in terms of accuracy with model-free temporal differencing methods.