Balancing exploration and exploitation ratio in reinforcement learning

Authors:
Ozkan Ozcan;Claudio Coreixas de Moraes;Jonathan Alt
Affiliations:
MOVES Institute, Naval Postgraduate School, Monterey, California;MOVES Institute, Naval Postgraduate School, Monterey, California;MOVES Institute, Naval Postgraduate School, Monterey, California
Venue:
Proceedings of the 2011 Military Modeling & Simulation Symposium
Year:
2011

Citing 3
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Bandit problems and the exploration/exploitation tradeoff

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The issue of controlling the ratio of exploration and exploitation in agent learning in dynamic environments provides a continuing challenge in the application of agent learning techniques. Methods to control this ratio in a manner that mimics human behavior are required for use in the representation of human behavior, which seek to constrain agent learning mechanisms in a manner similar to that observed in human cognition. This paper describes the use of two novel methods for adjusting the exploration and exploitation ratio of agents using a simple grid-world example and two armed bandit example.