Bayesian Q-learning

Authors:
Richard Dearden;Nir Friedman;Stuart Russell
Affiliations:
-;-;-
Venue:
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Year:
1998

Citing 10
Cited 47

Do the right thing: studies in limited rationality

Do the right thing: studies in limited rationality
Learning to Perceive and Act by Trial and Error

Machine Learning
Elements of information theory

Elements of information theory
Technical Note: \cal Q-Learning

Machine Learning
Learning in embedded systems

Learning in embedded systems
Exploration of Multi-State Environments: Local Measures and Back-Propagation of Uncertainty

Machine Learning
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Dynamic Programming

Dynamic Programming
Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,

Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables,
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

A motivational system that drives the development of activity

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 1
Learning-assisted automated planning: looking back, taking stock, going forward

AI Magazine
Bias and variance in value function estimation

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
A Bayesian approach to learning classifier systems in uncertain environments

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Graph kernels and Gaussian processes for relational reinforcement learning

Machine Learning
The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Neural Networks
Multi-task reinforcement learning: a hierarchical Bayesian approach

Proceedings of the 24th international conference on Machine learning
Efficiently determining the appropriate mix of personal interaction and reputation information in partner choice

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2
Sequential decision making with untrustworthy service providers

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2
Sequential decision making in repeated coalition formation under uncertainty

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
An Empirical Analysis of the Impact of Prioritised Sweeping on the DynaQ's Performance

ICAISC '08 Proceedings of the 9th international conference on Artificial Intelligence and Soft Computing
Bayesian Reward Filtering

Recent Advances in Reinforcement Learning
An Information-Theoretic Class of Stochastic Decision Processes

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
On universal transfer learning

Theoretical Computer Science
Model-free reinforcement learning as mixture learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active imitation learning

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Bayesian real-time dynamic programming

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Simple model-based exploration and exploitation of Markov decision processes using the elimination algorithm

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Bayesian role discovery for multi-agent reinforcement learning

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Uncertainty Propagation for Efficient Exploration in Reinforcement Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A framework for building intelligent SLA negotiation strategies under time constraints

GECON'10 Proceedings of the 7th international conference on Economics of grids, clouds, systems, and services
A minimum relative entropy principle for learning and acting

Journal of Artificial Intelligence Research
Learning the behavior model of a robot

Autonomous Robots
Solving non-stationary bandit problems by random sampling from sibling Kalman filters

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part III
Kalman temporal differences

Journal of Artificial Intelligence Research
Information Collection on a Graph

Operations Research
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Lagrange dual decomposition for finite horizon Markov decision processes

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Sequentially optimal repeated coalition formation under uncertainty

Autonomous Agents and Multi-Agent Systems
Nearly optimal exploration-exploitation decision thresholds

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Teamwork and simulation in hybrid cognitive architecture

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Leveraging domain knowledge to learn normative behavior: a bayesian approach

ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
A time-constrained SLA negotiation strategy in competitive computational grids

Future Generation Computer Systems
Robust bayesian reinforcement learning through tight lower bounds

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Bayesian multitask inverse reinforcement learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Decentralized Bayesian reinforcement learning for online agent collaboration

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Optimal learning of transition probabilities in the two-agent newsvendor problem

Proceedings of the Winter Simulation Conference
Learning Communication in Interactive Dynamic Influence Diagrams

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Variable risk control via stochastic optimization

International Journal of Robotics Research
Efficient learning in linearly solvable MDP models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
A general framework for interacting bayes-optimally with self-interested agents using arbitrary parametric model and model prior

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Monte-Carlo tree search for Bayesian reinforcement learning

Applied Intelligence
Scalable and efficient bayes-adaptive reinforcement learning based on monte-carlo tree search

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

A central problem in learning in complex environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information-the expected improvement in future decision quality that might arise from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper, we adopt a Bayesian approach to maintaining this uncertain information. We extend Watkins' Q-learning by maintaining and propagating probability distributions over the Q-values. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. We establish the convergence properties of our algorithm and show experimentally that it can exhibit substantial improvements over other well-known model-free exploration strategies.