Self-Organizing Neural Architectures and Cooperative Learning in a Multiagent Environment

Authors:
Dan Xiao;Ah-Hwee Tan
Affiliations:
Nanyang Technol. Univ., Singapore;-
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Year:
2007

Citing 0
Cited 4

Scaling Up Multi-agent Reinforcement Learning in Complex Domains

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Integrated cognitive architectures: a survey

Artificial Intelligence Review
A self-organizing neural architecture integrating desire, intention and reinforcement learning

Neurocomputing
A hybrid agent architecture integrating desire, intention and reinforcement learning

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Temporal-difference-fusion architecture for learning, cognition, and navigation (TD-FALCON) is a generalization of adaptive resonance theory (a class of self-organizing neural networks) that incorporates TD methods for real-time reinforcement learning. In this paper, we investigate how a team of TD-FALCON networks may cooperate to learn and function in a dynamic multiagent environment based on minefield navigation and a predator/prey pursuit tasks. Experiments on the navigation task demonstrate that TD-FALCON agent teams are able to adapt and function well in a multiagent environment without an explicit mechanism of collaboration. In comparison, traditional Q-learning agents using gradient-descent-based feedforward neural networks, trained with the standard backpropagation and the resilient-propagation (RPROP) algorithms, produce a significantly poorer level of performance. For the predator/prey pursuit task, we experiment with various cooperative strategies and find that a combination of a high-level compressed state representation and a hybrid reward function produces the best results. Using the same cooperative strategy, the TD-FALCON team also outperforms the RPROP-based reinforcement learners in terms of both task completion rate and learning efficiency.