The truck backer-upper: an example of self-learning in neural networks
Neural networks for control
Planning with an adaptive world model
NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Modern heuristic techniques for combinatorial problems
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Bandit problems and the exploration/exploitation tradeoff
IEEE Transactions on Evolutionary Computation
Temporal difference learning and simulated annealing for optimal control: a case study
KES-AMSTA'08 Proceedings of the 2nd KES International conference on Agent and multi-agent systems: technologies and applications
A new Q-learning with generalized approximation spaces
ICNC'09 Proceedings of the 5th international conference on Natural computation
Proceedings of the first ACM international workshop on Mission-oriented wireless sensor networking
Knowledge-Based Exploration for Reinforcement Learning in Self-Organizing Neural Networks
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Distributed dynamic data driven prediction based on reinforcement learning approach
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
In this paper, we consider reinforcement learning in systems with unknown environment where the agent must trade off efficiently between: exploration(long-term optimization) and exploitation (short-term optimization). Ɛ-greedy algorithm is a method using near-greedy action selection rule. It behaves greedily (exploitation) most of the time, but every once in a while, say with small probability Ɛ (exploration), instead select an action at random. Many works already proved that random exploration drives the agent towards poorly modeled states. Therefore, this study evaluates the role of heuristic based exploration in reinforcement learning. We proposed three methods: neighborhood search based exploration, simulated annealing based exploration, and tabu search based exploration. All techniques follow the same rule "Explore the most unvisited state". In the simulation, these techniques are evaluated and compared on a discrete reinforcement learning task (robot navigation).