Reinforcement learning with echo state networks

Authors:
István Szita;Viktor Gyenes;András Lőrincz
Affiliations:
Eötvös Loránd University, Budapest, Hungary;Eötvös Loránd University, Budapest, Hungary;Eötvös Loránd University, Budapest, Hungary
Venue:
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Year:
2006

Citing 6
Cited 4

A parallel network that learns to play backgammon

Artificial Intelligence
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Memory Approaches to Reinforcement Learning in Non-Markovian Domains

Memory Approaches to Reinforcement Learning in Non-Markovian Domains

Responsive elastic computing

GMAC '09 Proceedings of the 6th international conference industry session on Grids meets autonomic computing
A NEAT Way for Evolving Echo State Networks

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Transferring evolved reservoir features in reinforcement learning tasks

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Adaptive reservoir computing through evolution and learning

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Function approximators are often used in reinforcement learning tasks with large or continuous state spaces. Artificial neural networks, among them recurrent neural networks are popular function approximators, especially in tasks where some kind of of memory is needed, like in real-world partially observable scenarios. However, convergence guarantees for such methods are rarely available. Here, we propose a method using a class of novel RNNs, the echo state networks. Proof of convergence to a bounded region is provided for k-order Markov decision processes. Runs on POMDPs were performed to test and illustrate the working of the architecture.