The Two Facets of the Exploration-Exploitation Dilemma

Authors:
Kaifu Zhang;Wei Pan
Affiliations:
Tsinghua University, China;Tsinghua University, China
Venue:
IAT '06 Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology
Year:
2006

Citing 7
Cited 1

Learning in embedded systems

Learning in embedded systems
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Exploration bonuses and dual control

Machine Learning
Efficient model-based exploration

Proceedings of the fifth international conference on simulation of adaptive behavior on From animals to animats 5
Reinforcement Learning

Reinforcement Learning
Efficient Exploration In Reinforcement Learning

Efficient Exploration In Reinforcement Learning
Improving Modeling of Other Agents using Tentative Stereotypes and Compactification of Observations

IAT '04 Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology

Energy-accuracy trade-off for continuous mobile device location

Proceedings of the 8th international conference on Mobile systems, applications, and services

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an algorithm to better solve the exploration-exploitation dilemma faced by model-less reinforcement learning agents. The main contribution is twofold: (1) The two facets of the exploration-exploitation dilemma are distinguished: in some cases, the agent faces a non-stationary environment, therefore it needs to choose the best moment to explore in order to adapt to the changes; in some other cases, the agent faces a relatively large state-action space, and it therefore needs to choose the most promising subset of states/actions to explore. In this two-facet framework, we compared the relative advantage and limitations of two previously proposed algorithms in difference situations. (2) We unified these two algorithms to produce the new algorithm which works fairly well in all testing situations.