Control of exploitation-exploration meta-parameter in reinforcement learning
Neural Networks - Computational models of neuromodulation
Hi-index | 0.00 |
This article proposes a reinforcement learning (RL) method based on an actor-critic architecture, which can be applied to partially-observable multi-agent competitive games. As an example, we deal with a card game “Hearts”. In our method, the actor plays so as to enlarge the expected temporal-difference error, which is obtained based on the estimation of the state transition. The state transition is estimated by taking the inferred card distribution and the other player's action models into account.