Relational reinforcement learning
Machine Learning - Special issue on inducive logic programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Practical Reinforcement Learning in Continuous Spaces
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Least-squares policy iteration
The Journal of Machine Learning Research
The Linear Programming Approach to Approximate Dynamic Programming
Operations Research
Tree-Based Batch Mode Reinforcement Learning
The Journal of Machine Learning Research
A Generalization Error for Q-Learning
The Journal of Machine Learning Research
Dynamic Catalog Mailing Policies
Management Science
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Continuous State Dynamic Programming via Nonexpansive Approximation
Computational Economics
Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Binary action search for learning continuous-action control policies
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Machine Learning and Ecosystem Informatics: Challenges and Opportunities
ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Training and Testing Low-degree Polynomial Data Mappings via Linear SVM
The Journal of Machine Learning Research
ECML'05 Proceedings of the 16th European conference on Machine Learning
Policy iteration based on a learned transition model
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Hi-index | 0.00 |
Many reinforcement learning methods are based on a function Q (s ,a ) whose value is the discounted total reward expected after performing the action a in the state s . This paper explores the implications of representing the Q function as Q (s ,a )=s T Wa , where W is a matrix that is learned. In this representation, both s and a are real-valued vectors that may have high dimension. We show that action selection can be done using standard linear programming, and that W can be learned using standard linear regression in the algorithm known as fitted Q iteration. Experimentally, the resulting method learns to solve the mountain car task in a sample-efficient way. The same method is also applicable to an inventory management task where the state space and the action space are continuous and high-dimensional.