Implementing Parametric Reinforcement Learning in Robocup Rescue Simulation

Authors:
Omid Aghazadeh;Maziar Ahmad Sharbafi;Abolfazl Toroghi Haghighat
Affiliations:
Mechatrronic Research Lab, Azad University of Qazvin, Qazvin, Iran;Mechatrronic Research Lab, Azad University of Qazvin, Qazvin, Iran and Electrical and Computer engineering Department, University of Tehran, Tehran, Iran;Mechatrronic Research Lab, Azad University of Qazvin, Qazvin, Iran
Venue:
RoboCup 2007: Robot Soccer World Cup XI
Year:
2008

Citing 4
Cited 0

Experiments with reinforcement learning in problems with continuous state and action spaces

Adaptive Behavior
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Off-Policy Temporal Difference Learning with Function Approximation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Open Theoretical Questions in Reinforcement Learning

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Decision making in complex, multi agent and dynamic environments such as Rescue Simulation is a challenging problem in Artificial Intelligence. Uncertainty, noisy input data and stochastic behavior which is a common difficulty of real time environment makes decision making more complicated in such environments. Our approach to solve the bottleneck of dynamicity and variety of conditions in such situations is reinforcement learning. Classic reinforcement learning methods usually work with state and action value functions and temporal difference updates. Using function approximation is an alternative method to hold state and action value functions directly. Many Reinforcement learning methods in continuous action and state spaces implement function approximation and TD updates such as TD, LSTD, iLSTD, etc. A new approach to online reinforcement learning in continuous action or state spaces is presented in this paper which doesn't work with TD updates. We have named it Parametric Reinforcement Learning. This method is utilized in Robocup Rescue Simulation / Police Force agent's decision making process and the perfect results of this utilization have been shown in this paper. Our simulation results show that this method increases the speed of learning and simplicity of use. It has also very low memory usage and very low costing computation time.