Continuous-Action Q-Learning

  • Authors:
  • José Del R. Millán;Daniele Posenato;Eric Dedieu

  • Affiliations:
  • Joint Research Centre, European Commission, 21020 Ispra (VA), Italy. jose.millan@jrc.it (http://sta.jrc.it/sba/staff/jose.htm);Joint Research Centre, European Commission, 21020 Ispra (VA), Italy. daniele.posenato@jrc.it;Joint Research Centre, European Commission, 21020 Ispra (VA), Italy. eric.dedieu@jrc.it

  • Venue:
  • Machine Learning
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the “winning unit” weighted by their Q-values. Then, TD(λ) updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.