Exploration Strategies for Model-based Learning in Multi-agent Systems: Exploration Strategies

Authors:
David Carmel;Shaul Markovitch
Affiliations:
Computer Science Department, Technion, Haifa 32000, Israelcarmel@cs.technion.ac.il;Computer Science Department, Technion, Haifa 32000, Israelshaulm@cs.technion.ac.il
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
1999

Citing 18
Cited 13

The complexity of Markov decision processes

Mathematics of Operations Research
Learning regular sets from queries and counterexamples

Information and Computation
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Technical Note: \cal Q-Learning

Machine Learning
Learning in embedded systems

Learning in embedded systems
Efficient learning of typical finite automata from random walks

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Learning to coordinate without sharing information

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Optimality and domination in repeated games with bounded players

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Exactly learning automata with small cover time

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Exploration bonuses and dual control

Machine Learning
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
Learn Your Opponent's Strategy (in Polynominal Time)!

IJCAI '95 Proceedings of the Workshop on Adaption and Learning in Multi-Agent Systems
Efficient algorithms for learning to play repeated games against computationally bounded adversaries

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Learning models of intelligent agents

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Probabilistic exploration in planning while learning

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Learning and Exploiting Relative Weaknesses of Opponent Agents

Autonomous Agents and Multi-Agent Systems
Game-theoretic recommendations: some progress in an uphill battle

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Meta-level Control of Multiagent Learning in Dynamic Repeated Resource Sharing Problems

PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
A bibliographical study of grammatical inference

Pattern Recognition
Conversation mining in multi-agent systems

CEEMAS'03 Proceedings of the 3rd Central and Eastern European conference on Multi-agent systems
Competitive safety strategies in position auctions

WINE'07 Proceedings of the 3rd international conference on Internet and network economics
Zulu: an interactive learning competition

FSMNLP'09 Proceedings of the 8th international conference on Finite-state methods and natural language processing
An adaptive approach for the exploration-exploitation dilemma for learning agents

CEEMAS'05 Proceedings of the 4th international Central and Eastern European conference on Multi-Agent Systems and Applications
Learning in one-shot strategic form games

ECML'06 Proceedings of the 17th European conference on Machine Learning
Ten open problems in grammatical inference

ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
An adaptive approach for the exploration-exploitation dilemma and its application to economic systems

LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems
Exploration strategies for learning in multi-agent foraging

SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II
You are what you consume: a bayesian method for personalized recommendations

Proceedings of the 7th ACM conference on Recommender systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

An agent that interacts with other agents in multi-agent systems can benefit significantly from adapting to the others. When performing active learning, every agent's action affects the interaction process in two ways: The effect on the expected reward according to the current knowledge held by the agent, and the effect on the acquired knowledge, and hence, on future rewards expected to be received. The agent must therefore make a tradeoff between the wish to exploit its current knowledge, and the wish to explore other alternatives, to improve its knowledge for better decisions in the future. The goal of this work is to develop exploration strategies for a model-based learning agent to handle its encounters with other agents in a common environment. We first show how to incorporate exploration methods usually used in reinforcement learning into model-based learning. We then demonstrate the risk involved in exploration—an exploratory action taken by the agent can yield a better model of the other agent but also carries the risk of putting the agent into a much worse position.We present the lookahead-based exploration strategy that evaluates actions according to their expected utility, their expected contribution to the acquired knowledge, and the risk they carry. Instead of holding one model, the agent maintains a mixed opponent model, a belief distribution over a set of models that reflects its uncertainty about the opponent's strategy. Every action is evaluated according to its long run contribution to the expected utility and to the knowledge regarding the opponent's strategy. Risky actions are more likely to be detected by considering their expected outcome according to the alternative models of the opponent's behavior. We present an efficient algorithm that returns an almost optimal exploration plan against the mixed model and provide a proof of its correctness and an analysis of its complexity.We report experimental results in the Iterated Prisoner's Dilemma domain, comparing the capabilities of the different exploration strategies. The experiments demonstrate the superiority of lookahead-based exploration over other exploration methods.