Two Steps Reinforcement Learning in Continuous Reinforcement Learning Tasks

Authors:
Iván López-Bueno;Javier García;Fernando Fernández
Affiliations:
Universidad Carlos III de Madrid, Madrid, Spain;Universidad Carlos III de Madrid, Madrid, Spain;Universidad Carlos III de Madrid, Madrid, Spain
Venue:
IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
Year:
2009

Citing 7
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Variable Resolution Discretization in Optimal Control

Machine Learning
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
VQQL. Applying Vector Quantization to Reinforcement Learning

RoboCup-99: Robot Soccer World Cup III
Making reinforcement learning work on real robots

Making reinforcement learning work on real robots
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Two steps reinforcement learning

International Journal of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two steps reinforcement learning is a technique that combines an iterative refinement of a Q function estimator that can be used to obtains a state space discretization with classical reinforcement learning algorithms like Q-learning or Sarsa. However, the method requires a discrete reward function that permits learning an approximation of the Q function using classification algorithms. However, many domains have continuous reward functions that could only be tackled by discretizing the rewards. In this paper we propose solutions to this problem using discretization and regression methods. We demonstrate the usefulness of the resulting approach to improve the learning process in the Keepaway domain. We compare the obtained results with other techniques like VQQL and CMAC.