Efficient Exploration in Reinforcement Learning Based on Utile Suffix Memory

Authors:
Arthur Pchelkin
Affiliations:
Faculty of Computer Science and Information Technology, Riga Technical University, 1 Kalku Str., LV-1658 Riga, Latvia, e-mail: arturp@balticom.lv
Venue:
Informatica
Year:
2003

Citing 9
Cited 0

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Learning probabilistic automata with variable memory length

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Knowledge Growth in an Artificial Animal

Proceedings of the 1st International Conference on Genetic Algorithms
Dynamic Programming

Dynamic Programming
Efficient Exploration In Reinforcement Learning

Efficient Exploration In Reinforcement Learning
Greedy Utile Suffix Memory for Reinforcement Learning with Perceptually-Aliased States

Greedy Utile Suffix Memory for Reinforcement Learning with Perceptually-Aliased States
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Modeling adaptive autonomous agents

Artificial Life

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning addresses the question of how an autonomous agent can learn to choose optimal actions to achieve its goals. Efficient exploration is of fundamental importance for autonomous agents that learn to act. Previous approaches to exploration in reinforcement learning usually address exploration in the case when the environment is fully observable. In contrast, we study the case when the environment is only partially observable. We consider different exploration techniques applied to the learning algorithm “Utile Suffix Memory”, and, in addition, discuss an adaptive fringe depth. Experimental results in a partially observable maze show that exploration techniques have serious impact on performance of learning algorithm.