The complexity of Markov decision processes
Mathematics of Operations Research
Learning regular sets from queries and counterexamples
Information and Computation
Proceedings of the seventh international conference (1990) on Machine learning
Technical Note: \cal Q-Learning
Machine Learning
Learning in embedded systems
Efficient learning of typical finite automata from random walks
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Learning to coordinate without sharing information
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Optimality and domination in repeated games with bounded players
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Exactly learning automata with small cover time
COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Exploration bonuses and dual control
Machine Learning
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
Learn Your Opponent's Strategy (in Polynominal Time)!
IJCAI '95 Proceedings of the Workshop on Adaption and Learning in Multi-Agent Systems
Efficient algorithms for learning to play repeated games against computationally bounded adversaries
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Learning models of intelligent agents
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Probabilistic exploration in planning while learning
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Learning and Exploiting Relative Weaknesses of Opponent Agents
Autonomous Agents and Multi-Agent Systems
Game-theoretic recommendations: some progress in an uphill battle
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Meta-level Control of Multiagent Learning in Dynamic Repeated Resource Sharing Problems
PRICAI '08 Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
A bibliographical study of grammatical inference
Pattern Recognition
Conversation mining in multi-agent systems
CEEMAS'03 Proceedings of the 3rd Central and Eastern European conference on Multi-agent systems
Competitive safety strategies in position auctions
WINE'07 Proceedings of the 3rd international conference on Internet and network economics
Zulu: an interactive learning competition
FSMNLP'09 Proceedings of the 8th international conference on Finite-state methods and natural language processing
An adaptive approach for the exploration-exploitation dilemma for learning agents
CEEMAS'05 Proceedings of the 4th international Central and Eastern European conference on Multi-Agent Systems and Applications
Learning in one-shot strategic form games
ECML'06 Proceedings of the 17th European conference on Machine Learning
Ten open problems in grammatical inference
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
LAMAS'05 Proceedings of the First international conference on Learning and Adaption in Multi-Agent Systems
Exploration strategies for learning in multi-agent foraging
SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II
You are what you consume: a bayesian method for personalized recommendations
Proceedings of the 7th ACM conference on Recommender systems
Hi-index | 0.00 |
An agent that interacts with other agents in multi-agent systems can benefit significantly from adapting to the others. When performing active learning, every agent's action affects the interaction process in two ways: The effect on the expected reward according to the current knowledge held by the agent, and the effect on the acquired knowledge, and hence, on future rewards expected to be received. The agent must therefore make a tradeoff between the wish to exploit its current knowledge, and the wish to explore other alternatives, to improve its knowledge for better decisions in the future. The goal of this work is to develop exploration strategies for a model-based learning agent to handle its encounters with other agents in a common environment. We first show how to incorporate exploration methods usually used in reinforcement learning into model-based learning. We then demonstrate the risk involved in exploration—an exploratory action taken by the agent can yield a better model of the other agent but also carries the risk of putting the agent into a much worse position.We present the lookahead-based exploration strategy that evaluates actions according to their expected utility, their expected contribution to the acquired knowledge, and the risk they carry. Instead of holding one model, the agent maintains a mixed opponent model, a belief distribution over a set of models that reflects its uncertainty about the opponent's strategy. Every action is evaluated according to its long run contribution to the expected utility and to the knowledge regarding the opponent's strategy. Risky actions are more likely to be detected by considering their expected outcome according to the alternative models of the opponent's behavior. We present an efficient algorithm that returns an almost optimal exploration plan against the mixed model and provide a proof of its correctness and an analysis of its complexity.We report experimental results in the Iterated Prisoner's Dilemma domain, comparing the capabilities of the different exploration strategies. The experiments demonstrate the superiority of lookahead-based exploration over other exploration methods.