The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
A multiagent reinforcement learning algorithm with non-linear dynamics
Journal of Artificial Intelligence Research
Emergence of norms through social learning
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Frequency adjusted multi-agent Q-learning
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Emergence and stability of social conventions in conflict situations
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Hi-index | 0.00 |
Q-learning is a very popular reinforcement learning algorithm being proven to converge to optimal policies in Markov decision processes. However, Q-learning shows artifacts if the optimal action is played with a low probability, a situation that may arise due to wrong initialization of Q-values or due to convergence to an almost pure policy after which a change in the environment makes another action optimal. These artifacts was resolved in literature by the variant Frequency Adjusted Q-learning (FAQL). However, FAQL also suffered from practical concerns that limited the policy subspace for which the behavior was improved. Here, we introduce the Repeated Update Q-learning (RUQL), a variant of Q-learning that resolves the undesirable artifacts of Q-learning without the practical concerns of FAQL. We show (both theoretically and experimentally) the similarities and differences between RUQL and FAQL (the closest state-of-the-art). Experimental results verify the theoretical insights and show how RUQL outperforms FAQL and QL in non-stationary environments.