Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Using collective intelligence to route Internet traffic
Proceedings of the 1998 conference on Advances in neural information processing systems II
Hierarchical multi-agent reinforcement learning
Proceedings of the fifth international conference on Autonomous agents
Simulation Modeling and Analysis
Simulation Modeling and Analysis
The Complexity of Decentralized Control of Markov Decision Processes
Mathematics of Operations Research
Friend-or-Foe Q-learning in General-Sum Games
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Multi-Agent Policy-Gradient Approach to Network Routing
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Nash q-learning for general-sum stochastic games
The Journal of Machine Learning Research
Collectives and Design Complex Systems
Collectives and Design Complex Systems
Bayesian Reinforcement Learning for Coalition Formation under Uncertainty
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Queueing Networks and Markov Chains
Queueing Networks and Markov Chains
Learning the task allocation game
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Exploring selfish reinforcement learning in repeated games with stochastic rewards
Autonomous Agents and Multi-Agent Systems
Multiagent reinforcement learning and self-organization in a network of agents
Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Stochastic kriging for simulation metamodeling
Proceedings of the 40th Conference on Winter Simulation
Collective intelligence, data routing and braess' paradox
Journal of Artificial Intelligence Research
A multi-agent learning approach to online distributed resource allocation
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Self-organization for coordinating decentralized reinforcement learning
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Collaborative Function Approximation in Social Multiagent Systems
WI-IAT '10 Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Better simulation metamodeling: the why, what, and how of stochastic kriging
Winter Simulation Conference
Hi-index | 0.00 |
In continuous learning settings stochastic stable policies are often necessary to ensure that agents continuously adapt to dynamic environments. The choice of the decentralised learning system and the employed policy plays an important role in the optimisation task. For example, a policy that exhibits fluctuations may also introduce non-linear effects which other agents in the environment may not be able to cope with and even amplify these effects. In dynamic and unpredictable multiagent environments these oscillations may introduce instabilities. In this paper, we take inspiration from the limbic system to introduce an extension to the weighted policy learner, where agents evaluate rewards as either positive or negative feedback, depending on how they deviate from average expected rewards. Agents have positive and negative biases, where a bias either magnifies or depresses a positive or negative feedback signal. To contain the non-linear effects of biased rewards, we incorporate a decaying memory of past positive and negative feedback signals to provide a smoother gradient update on the probability simplex, spreading out the effect of the feedback signal over time. By splitting the feedback signal, more leverage on the win or learn fast (WoLF) principle is possible. The cognitive policy learner is evaluated using a small queueing network and compared with the fair action and weighted policy learner. Emphasis is placed on analysing the dynamics of the learning algorithms with respect to the stability of the queueing network and the overall queueing performance.