Reinforcement learning with a bilinear q function

Authors:
Charles Elkan
Affiliations:
Department of Computer Science and Engineering, University of California, San Diego, CA
Venue:
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Year:
2011

Citing 15
Cited 1

Relational reinforcement learning

Machine Learning - Special issue on inducive logic programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
A Generalization Error for Q-Learning

The Journal of Machine Learning Research
Dynamic Catalog Mailing Policies

Management Science
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)

Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Continuous State Dynamic Programming via Nonexpansive Approximation

Computational Economics
Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Binary action search for learning continuous-action control policies

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Machine Learning and Ecosystem Informatics: Challenges and Opportunities

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Training and Testing Low-degree Polynomial Data Mappings via Linear SVM

The Journal of Machine Learning Research
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning

Policy iteration based on a learned transition model

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many reinforcement learning methods are based on a function Q (s ,a ) whose value is the discounted total reward expected after performing the action a in the state s . This paper explores the implications of representing the Q function as Q (s ,a )=s T Wa , where W is a matrix that is learned. In this representation, both s and a are real-valued vectors that may have high dimension. We show that action selection can be done using standard linear programming, and that W can be learned using standard linear regression in the algorithm known as fitted Q iteration. Experimentally, the resulting method learns to solve the mountain car task in a sample-efficient way. The same method is also applicable to an inventory management task where the state space and the action space are continuous and high-dimensional.