Reinforcement learning with a bilinear q function

  • Authors:
  • Charles Elkan

  • Affiliations:
  • Department of Computer Science and Engineering, University of California, San Diego, CA

  • Venue:
  • EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many reinforcement learning methods are based on a function Q (s ,a ) whose value is the discounted total reward expected after performing the action a in the state s . This paper explores the implications of representing the Q function as Q (s ,a )=s T Wa , where W is a matrix that is learned. In this representation, both s and a are real-valued vectors that may have high dimension. We show that action selection can be done using standard linear programming, and that W can be learned using standard linear regression in the algorithm known as fitted Q iteration. Experimentally, the resulting method learns to solve the mountain car task in a sample-efficient way. The same method is also applicable to an inventory management task where the state space and the action space are continuous and high-dimensional.