Epoch-Incremental Queue-Dyna Algorithm

Authors:
Roman Zajdel
Affiliations:
Faculty of Electrical and Computer Engineering, Rzeszow University of Technology, Rzeszow, Poland 35-959
Venue:
ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Year:
2006

Citing 11
Cited 1

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Efficient learning and planning within the Dyna framework

Proceedings of the second international conference on From animals to animats 2 : simulation of adaptive behavior: simulation of adaptive behavior
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Model-based average reward reinforcement learning

Artificial Intelligence
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Improving action selection in MDP's via knowledge transfer

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Temporal-difference networks with history

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning to act using real-time dynamic programming

Artificial Intelligence

Fuzzy epoch-incremental reinforcement learning algorithm

ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The basic reinforcement learning algorithm, as Q-learning, is characterized by short time-consuming single learning step, however, the number of epochs necessary to achieve the optimal policy is not satisfactory. There are many methods that reduce the number of necessary epochs, like TD(茂戮驴 0), Dyna or prioritized sweeping, but their learning time is considerable. This paper proposes a combination of Q-learning algorithm performed in incremental mode with executed in epoch mode method of acceleration based on environment model and distance to terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(茂戮驴)-learning, Dyna-Q and prioritized sweeping in the experiments on three maze tasks. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.