Proceedings of the seventh international conference (1990) on Machine learning
Efficient learning and planning within the Dyna framework
Proceedings of the second international conference on From animals to animats 2 : simulation of adaptive behavior: simulation of adaptive behavior
Model-based average reward reinforcement learning
Artificial Intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Improving action selection in MDP's via knowledge transfer
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Temporal-difference networks with history
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Learning to act using real-time dynamic programming
Artificial Intelligence
Fuzzy epoch-incremental reinforcement learning algorithm
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part I
Hi-index | 0.00 |
The basic reinforcement learning algorithm, as Q-learning, is characterized by short time-consuming single learning step, however, the number of epochs necessary to achieve the optimal policy is not satisfactory. There are many methods that reduce the number of necessary epochs, like TD(茂戮驴 0), Dyna or prioritized sweeping, but their learning time is considerable. This paper proposes a combination of Q-learning algorithm performed in incremental mode with executed in epoch mode method of acceleration based on environment model and distance to terminal state. This approach ensures the maintenance of short time of a single learning step and high efficiency comparable with Dyna or prioritized sweeping. Proposed algorithm is compared with Q(茂戮驴)-learning, Dyna-Q and prioritized sweeping in the experiments on three maze tasks. The time-consuming learning process and number of epochs necessary to reach the terminal state is used to evaluate the efficiency of compared algorithms.