A general framework for interacting bayes-optimally with self-interested agents using arbitrary parametric model and model prior

Authors:
Trong Nghia Hoang;Kian Hsiang Low
Affiliations:
Department of Computer Science, National University of Singapore, Republic of Singapore;Department of Computer Science, National University of Singapore, Republic of Singapore
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 16
Cited 0

Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A multiagent reinforcement learning algorithm using extended optimal response

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
Friend-or-Foe Q-learning in General-Sum Games

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Coordination in multiagent reinforcement learning: a Bayesian approach

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Adaptive multi-robot wide-area exploration and mapping

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Multiagent reinforcement learning: algorithm converging to Nash equilibrium in general-sum discounted stochastic games

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Heuristic selection of actions in multiagent reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Rational and convergent learning in stochastic games

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Learning against opponents with bounded memory

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Active Markov information-theoretic path planning for robotic environmental sensing

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Decentralized active robotic exploration and mapping for probabilistic field classification in environmental sensing

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Decision-theoretic approach to maximizing observation of multiple targets in multi-camera surveillance

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Intention-aware planning under uncertainty for interacting with self-interested, boundedly rational agents

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Multi-robot informative path planning for active sensing of environmental phenomena: a tale of two algorithms

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advances in Bayesian reinforcement learning (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the environment's latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multiagent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a fine-grained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multi-agent reinforcement learning algorithms.