Speeding-up Reinforcement Learning with Multi-step Actions

Authors:
Ralf Schoknecht;Martin A. Riedmiller
Affiliations:
-;-
Venue:
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Year:
2002

Citing 5
Cited 4

Adaptive choice of grid and time in reinforcement learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
Using Options for Knowledge Transfer in Reinforcement Learning TITLE2:

Using Options for Knowledge Transfer in Reinforcement Learning TITLE2:
Hierarchical control and learning for markov decision processes

Hierarchical control and learning for markov decision processes
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research

Recursive Adaptation of Stepsize Parameter for Non-stationary Environments

PRIMA '09 Proceedings of the 12th International Conference on Principles of Practice in Multi-Agent Systems
Learning to control at multiple time scales

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
Recursive adaptation of stepsize parameter for non-stationary environments

ALA'09 Proceedings of the Second international conference on Adaptive and Learning Agents
Q-learning Reward Propagation Method for Reducing the Transmission Power of Sensor Nodes in Wireless Sensor Networks

Wireless Personal Communications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years hierarchical concepts of temporal abstraction have been integrated in the reinforcement learning framework to improve scalability. However, existing approaches are limited to domains where a decomposition into subtasks is known a priori. In this paper we propose the concept of explicitly selecting time scale related actions if no subgoalrelated abstract actions are available. This is realised with multistep actions on different time scales that are combined in one single action set. The special structure of the action set is exploited in the MSAQ-learning algorithm. By learning on different explicitly specified time scales simultaneously, a considerable improvement of learning speed can be achieved. This is demonstrated on two benchmark problems.