Using continuous action spaces to solve discrete problems

Authors:
Hado van Hasselt;Marco A. Wiering
Affiliations:
Utrecht University, Utrecht;Department of Artificial Intelligence, University of Groningen, Groningen
Venue:
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Year:
2009

Citing 5
Cited 2

Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning

Teaching a robot to perform tasks with voice commands

MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
Thalamic cooperation between the cerebellum and basal ganglia with a new tropism-based action-dependent heuristic dynamic programming method

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-world control problems are often modeled as Markov Decision Processes (MDPs) with discrete action spaces to facilitate the use of the many reinforcement learning algorithms that exist to find solutions for such MDPs. For many of these problems an underlying continuous action space can be assumed. We investigate the performance of the Cacla algorithm, which uses a continuous actor, on two such MDPs: the mountain car and the cart pole. We show that Cacla has clear advantages over discrete algorithms such as Q-Iearning and Sarsa, even though its continuous actions get rounded to actions in the same finite action space that may contain only a small number of actions. In particular, we show that Cacla retains much better performance when the action space is changed by removing some actions after some time of learning.