Elements of information theory
Elements of information theory
Technical Note: \cal Q-Learning
Machine Learning
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
Efficient Exploration In Reinforcement Learning
Efficient Exploration In Reinforcement Learning
Graph theory: An algorithmic approach (Computer science and applied mathematics)
Graph theory: An algorithmic approach (Computer science and applied mathematics)
Dynamic task allocation within an open service-oriented MAS architecture
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Randomized shortest-path problems: Two related models
Neural Computation
Hi-index | 0.00 |
This paper presents a framework allowing to tune continual exploration in an optimal way. It first quantifies the rate of exploration by defining the degree of exploration of a state as the probability-distribution entropy for choosing an admissible action. Then, the exploration/exploitation tradeoff is stated as a global optimization problem: find the exploration strategy that minimizes the expected cumulated cost, while maintaining fixed degrees of exploration at same nodes. In other words, “exploitation” is maximized for constant “exploration”. This formulation leads to a set of nonlinear updating rules reminiscent of the value-iteration algorithm. Convergence of these rules to a local minimum can be proved for a stationary environment. Interestingly, in the deterministic case, when there is no exploration, these equations reduce to the Bellman equations for finding the shortest path while, when it is maximum, a full “blind” exploration is performed.