Event-learning and robust policy heuristics

Authors:
András Lörincz;Imre Pólik;István Szita
Affiliations:
Department of Information Systems, Eötvös Loránd University, Pázmány Péter sétány 1/C, H-1117 Budapest, Hungary;Department of Information Systems, Eötvös Loránd University, Pázmány Péter sétány 1/C, H-1117 Budapest, Hungary;Department of Information Systems, Eötvös Loránd University, Pázmány Péter sétány 1/C, H-1117 Budapest, Hungary
Venue:
Cognitive Systems Research
Year:
2003

Citing 9
Cited 0

A neuron model with fluid properties for solving labyrinthian puzzle

Biological Cybernetics
Gross motion planning—a survey

ACM Computing Surveys (CSUR)
Neural network dynamics for path planning and obstacle avoidance

Neural Networks
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
A unified analysis of value-function-based reinforcement learning algorithms

Neural Computation
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Algorithms for sequential decision-making

Algorithms for sequential decision-making
Reinforcement Learning in Continuous Time and Space

Neural Computation
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we introduce a novel reinforcement learning algorithm called event-learning. The algorithm uses events, ordered pairs of two consecutive states. We define event-value function and we derive learning rules. Combining our method with a well-known robust control method, the SDS algorithm, we introduce Robust Policy Heuristics (RPH). It is shown that RPH, a fast-adapting non-Markovian policy, is particularly useful for coarse models of the environment and could be useful for some partially observed systems. RPH may be of help in alleviating the 'curse of dimensionality' problem. Event-learning and RPH can be used to separate time scales of learning of value functions and adaptation. We argue that the definition of modules is straightforward for event-learning and event-learning makes planning feasible in the RL framework. Computer simulations of a rotational inverted pendulum with coarse discretization are shown to demonstrate the principle.