Reinforcement algorithms using functional approximation for generalization and their application to cart centering and fractal compression

  • Authors:
  • Clifford Claussen;Srinivas Gutta;Harry Wechsler

  • Affiliations:
  • Department of Computer Science, George Mason University, Fairfax, VA;Department of Computer Science, George Mason University, Fairfax, VA;Department of Computer Science, George Mason University, Fairfax, VA

  • Venue:
  • IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the conflict between identification and control or alternatively, the conflict between exploration and exploitation, within the framework of reinforcement learning. Q-learning has recently become a popular off-policy reinforcement learning method. The conflict between exploration and exploitation slows down Q-learning algorithms; their performance does not scale up and degrades rapidly as the number of states and actions increases. One reason for this slowness is that exploration lacks the ability to extrapolate and interpolate from learning and to a large extent has to "reinvent the wheel". Moreover, not all reinforcement problems one encounters are finite state and action systems. Our approach to solving continuous state and action problems is to approximate the continuous state and action spaces with finite sets of states and actions and then to apply a finite state and action learning method. This approach provides the means for solving continuous state and action problems but does not yet address the performance problem associated with scaling up states and actions. We address the scaling problem using functional approximation methods. Towards that end, this paper introduces two new reinforcement algorithms, QLVQ and Quad-Q-learning, respectively, and shows their successful application for cart centering and fractal compression.