A Two-Stage Relational Reinforcement Learning with Continuous Actions for Real Service Robots

Authors:
Julio H. Zaragoza;Eduardo F. Morales
Affiliations:
Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Tonantzintla, México 72840;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Tonantzintla, México 72840
Venue:
MICAI '09 Proceedings of the 8th Mexican International Conference on Artificial Intelligence
Year:
2009

Citing 3
Cited 0

Q-Learning in Continuous State and Action Spaces

AI '99 Proceedings of the 12th Australian Joint Conference on Artificial Intelligence: Advanced Topics in Artificial Intelligence
Using Gaussian Processes in Bayesian Robot Programming

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Relational macros for transfer in reinforcement learning

ILP'07 Proceedings of the 17th international conference on Inductive logic programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement Learning is a commonly used technique in robotics, however, traditional algorithms are unable to handle large amounts of data coming from the robot's sensors, require long training times, are unable to re-use learned policies on similar domains, and use discrete actions. This work introduces TS -RRLCA , a two stage method to tackle these problems. In the first stage, low-level data coming from the robot's sensors is transformed into a more natural, relational representation based on rooms, walls, corners, doors and obstacles, significantly reducing the state space. We also use Behavioural Cloning, i.e., traces provided by the user to learn, in few iterations, a relational policy that can be re-used in different environments. In the second stage, we use Locally Weighted Regression to transform the initial policy into a continuous actions policy. We tested our approach with a real service robot on different environments for different navigation and following tasks. Results show how the policies can be used on different domains and perform smoother, faster and shorter paths than the original policies.