Binary action search for learning continuous-action control policies

Authors:
Jason Pazis;Michail G. Lagoudakis
Affiliations:
Technical University of Crete, Chania, Crete, Greece;Technical University of Crete, Chania, Crete, Greece
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 8
Cited 1

Experiments with reinforcement learning in problems with continuous state and action spaces

Adaptive Behavior
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Continuous-Action Q-Learning

Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Reinforcement Learning with Factored States and Actions

The Journal of Machine Learning Research
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
An approach to fuzzy control of nonlinear systems: stability and design issues

IEEE Transactions on Fuzzy Systems
Adaptive critic designs

IEEE Transactions on Neural Networks

Reinforcement learning with a bilinear q function

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement Learning methods for controlling stochastic processes typically assume a small and discrete action space. While continuous action spaces are quite common in real-world problems, the most common approach still employed in practice is coarse discretization of the action space. This paper presents a novel method, called Binary Action Search, for realizing continuousaction policies by searching efficiently the entire action range through increment and decrement modifications to the values of the action variables according to an internal binary policy defined over an augmented state space. The proposed approach essentially approximates any continuous action space to arbitrary resolution and can be combined with any discrete-action reinforcement learning algorithm for learning continuous-action policies. Binary Action Search eliminates the restrictive modification steps of Adaptive Action Modification and requires no temporal action locality in the domain. Our approach is coupled with two well-known reinforcement learning algorithms (Least-Squares Policy Iteration and Fitted Q-Iteration) and its use and properties are thoroughly investigated and demonstrated on the continuous state-action Inverted Pendulum, Double Integrator, and Car on the Hill domains.