Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
The dynamics of reinforcement learning in cooperative multiagent systems
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Planning, learning and coordination in multiagent decision processes
TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
Learning of coordination: exploiting sparse interactions in multiagent systems
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Learning multi-agent state space representations
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Decentralized Learning in Markov Games
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
One of the main advantages of Reinforcement Learning is the capability of dealing with a delayed reward signal. Using an appropriate backup diagram, rewards are backpropagated through the state space. This allows agents to learn to take the correct action that results in the highest future (discounted) reward, even if that action results in a suboptimal immediate reward in the current state. In a multi-agent environment, agents can use the same principles as in single agent RL, but have to apply them in a complete joint-state-joint-action space to guarantee optimality. Learning in such a state space can however be very slow. In this paper we present our approach for mitigating this problem. Future Coordinating Q-learning (FCQ-learning) detects strategic interactions between agents several timesteps before these interactions occur. FCQ-learning uses the same principles as CQ-learning [3] to detect the states in which interaction is required, but several timesteps before this is reflected in the reward signal. In these states, the algorithm will augment the state information to include information about other agents which is used to select actions. The techniques presented in this paper are the first to explicitly deal with a delayed reward signal when learning using sparse interactions.