Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neurons, Viscose Fluids, Freshwater Polyp Hydra-and Self-Organizing Information Systems
IEEE Intelligent Systems
The Structure and Dynamics of Networks: (Princeton Studies in Complexity)
The Structure and Dynamics of Networks: (Princeton Studies in Complexity)
Evolution of cooperation in a population of selfish adaptive agents
ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
Individual selection for cooperative group formation
ECAL'07 Proceedings of the 9th European conference on Advances in artificial life
Can selfish symbioses effect higher-level selection?
ECAL'09 Proceedings of the 10th European conference on Advances in artificial life: Darwin meets von Neumann - Volume Part II
The Knowledge Engineering Review
No free lunch theorems for optimization
IEEE Transactions on Evolutionary Computation
Self-organizing channel assignment for wireless systems
IEEE Communications Magazine
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Hi-index | 0.00 |
Simple distributed strategies that modify the behavior of selfish individuals in a manner that enhances cooperation or global efficiency have proved difficult to identify. We consider a network of selfish agents who each optimize their individual utilities by coordinating (or anticoordinating) with their neighbors, to maximize the payoffs from randomly weighted pairwise games. In general, agents will opt for the behavior that is the best compromise (for them) of the many conflicting constraints created by their neighbors, but the attractors of the system as a whole will not maximize total utility. We then consider agents that act as creatures of habit by increasing their preference to coordinate (anticoordinate) with whichever neighbors they are coordinated (anticoordinated) with at present. These preferences change slowly while the system is repeatedly perturbed, so that it settles to many different local attractors. We find that under these conditions, with each perturbation there is a progressively higher chance of the system settling to a configuration with high total utility. Eventually, only one attractor remains, and that attractor is very likely to maximize (or almost maximize) global utility. This counterintuitive result can be understood using theory from computational neuroscience; we show that this simple form of habituation is equivalent to Hebbian learning, and the improved optimization of global utility that is observed results from well-known generalization capabilities of associative memory acting at the network scale. This causes the system of selfish agents, each acting individually but habitually, to collectively identify configurations that maximize total utility.