Research on improvement of model-free average reward reinforcement learning and its simulation experiment

Authors:
Wei Chen;Zhenkun Zhai;Xiong Li;Jing Guo;Jie Wang
Affiliations:
Faculty of Automation, Guangdong University of Technology, Guangzhou;Faculty of Automation, Guangdong University of Technology, Guangzhou;Faculty of Automation, Guangdong University of Technology, Guangzhou;Faculty of Automation, Guangdong University of Technology, Guangzhou;Faculty of Automation, Guangdong University of Technology, Guangzhou
Venue:
CCDC'09 Proceedings of the 21st annual international conference on Chinese control and decision conference
Year:
2009

Citing 7
Cited 0

Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Hierarchical multi-agent reinforcement learning

Proceedings of the fifth international conference on Autonomous agents
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Nash q-learning for general-sum stochastic games

The Journal of Machine Learning Research
Planning, learning and coordination in multiagent decision processes

TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
Cooperative Multi-Agent Learning: The State of the Art

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional reinforcement learning always emphasizes the independent learning of a single agent. In Multi-Agent System (MAS), considering the relationship between independent learning and group learning, this paper presents a hybrid algorithm based on average reward reinforcement learning. In learning process of the modified algorithm, it still pays attention to the independent learning. In order to select an action which can reflect the multi-agent environmental information, we add the observed information and the prediction of other agent's actions when the learning agent chooses his action according to the current environmental state. The advantage of this design is that not only the agent will learn the optimal policy through autonomous study, but also as one member of MAS, the learning process can be integrated into the whole multi-agent environment. Robocup simulation league (2D) is a typical multi-agent system. By applying the new method to the training of the player, we prove the feasibility and validity of this algorithm.