Two-step gradient-based reinforcement learning for underwater robotics behavior learning

Authors:
Andres El-Fakdi;Marc Carreras
Affiliations:
-;-
Venue:
Robotics and Autonomous Systems
Year:
2013

Citing 14
Cited 0

Technical Note: \cal Q-Learning

Machine Learning
Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching

Machine Learning
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Natural gradient works efficiently in learning

Neural Computation
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Practical Reinforcement Learning in Continuous Spaces

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A vision system for an underwater cable tracker

Machine Vision and Applications - Special issue: IEEE WACV
Making reinforcement learning work on real robots

Making reinforcement learning work on real robots
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Machine learning of motor skills for robotics

Machine learning of motor skills for robotics
A particle filter-based approach for tracking undersea narrow telecommunication cables

Machine Vision and Applications
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article proposes a field application of a Reinforcement Learning (RL) control system for solving the action selection problem of an autonomous robot in a cable tracking task. The Ictineu Autonomous Underwater Vehicle (AUV) learns to perform a visual based cable tracking task in a two step learning process. First, a policy is computed by means of simulation where a hydrodynamic model of the vehicle simulates the cable following task. The identification procedure follows a specially designed Least Squares (LS) technique. Once the simulated results are accurate enough, in a second step, the learnt-in-simulation policy is transferred to the vehicle where the learning procedure continues in a real environment, improving the initial policy. The Natural Actor-Critic (NAC) algorithm has been selected to solve the problem. This Actor-Critic (AC) algorithm aims to take advantage of Policy Gradient (PG) and Value Function (VF) techniques for fast convergence. The work presented contains extensive real experimentation. The main objective of this work is to demonstrate the feasibility of RL techniques to learn autonomous underwater tasks, the selection of a cable tracking task is motivated by an increasing industrial demand in a technology to survey and maintain underwater structures.