Continuous-Action Q-Learning

Authors:
José Del R. Millán;Daniele Posenato;Eric Dedieu
Affiliations:
Joint Research Centre, European Commission, 21020 Ispra (VA), Italy. jose.millan@jrc.it (http://sta.jrc.it/sba/staff/jose.htm);Joint Research Centre, European Commission, 21020 Ispra (VA), Italy. daniele.posenato@jrc.it;Joint Research Centre, European Commission, 21020 Ispra (VA), Italy. eric.dedieu@jrc.it
Venue:
Machine Learning
Year:
2002

Citing 12
Cited 19

Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments

Machine Learning
Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Self-organizing maps

Self-organizing maps
Model-based average reward reinforcement learning

Artificial Intelligence
Experiments with reinforcement learning in problems with continuous state and action spaces

Adaptive Behavior
Learning reaching strategies through reinforcement for a sensor-based manipulator

Neural Networks
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Incremental Acquisition of Local Networks for the Control of Autonomous Robots

ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
Rapid, safe, and incremental learning of navigation strategies

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Teaching robots to plan through Q-learning

Robotica
An Ensemble of Cooperative Extended Kohonen Maps for Complex Robot Motion Tasks

Neural Computation
A Neural Learning Classifier System with Self-Adaptive Constructivism for Mobile Robot Control

Artificial Life
Guiding exploration by pre-existing knowledge without modifying reward

Neural Networks
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
Accelerating autonomous learning by using heuristic selection of actions

Journal of Heuristics
Knowledge propagation in a distributed omnidirectional vision system

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Marco Somalvico Memorial Issue
Real-time dynamic fuzzy Q-learning and control of mobile robots

ICECS'03 Proceedings of the 2nd WSEAS International Conference on Electronics, Control and Signal Processing
Dynamic packaging in e-retailing with stochastic demand over finite horizons: A Q-learning approach

Expert Systems with Applications: An International Journal
Automatic generation of fuzzy inference systems via unsupervised learning

Neural Networks
Reinforcement distribution in fuzzy Q-learning

Fuzzy Sets and Systems
Binary action search for learning continuous-action control policies

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fuzzy decision tree function approximation in reinforcement learning

International Journal of Artificial Intelligence and Soft Computing
Evolving symbolic controllers

EvoWorkshops'03 Proceedings of the 2003 international conference on Applications of evolutionary computing
Continuous state/action reinforcement learning: A growing self-organizing map approach

Neurocomputing
A hybrid self-learning approach for generating fuzzy inference systems

ICONIP'06 Proceedings of the 13th international conference on Neural information processing - Volume Part III
Making use of unelaborated advice to improve reinforcement learning: a mobile robotics approach

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Adaptive stock trading with dynamic asset allocation using reinforcement learning

Information Sciences: an International Journal
Distributed self-learning scheduling approach for wireless sensor network

Ad Hoc Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a Q-learning method that works in continuous domains. Other characteristics of our approach are the use of an incremental topology preserving map (ITPM) to partition the input space, and the incorporation of bias to initialize the learning process. A unit of the ITPM represents a limited region of the input space and maps it onto the Q-values of M possible discrete actions. The resulting continuous action is an average of the discrete actions of the “winning unit” weighted by their Q-values. Then, TD(λ) updates the Q-values of the discrete actions according to their contribution. Units are created incrementally and their associated Q-values are initialized by means of domain knowledge. Experimental results in robotics domains show the superiority of the proposed continuous-action Q-learning over the standard discrete-action version in terms of both asymptotic performance and speed of learning. The paper also reports a comparison of discounted-reward against average-reward Q-learning in an infinite horizon robotics task.