Model based Bayesian exploration

Authors:
Richard Dearden;Nir Friedman;David Andre
Affiliations:
Department of Computer Science, University of British Columbia, Vancouver, BC, Canada;Institute of Computer Science, Hebrew University, Jerusalem, Israel;Computer Science Division, University of California, Berkeley, CA
Venue:
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Year:
1999

Citing 13
Cited 47

Stochastic simulation

Stochastic simulation
Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Do the right thing: studies in limited rationality

Do the right thing: studies in limited rationality
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Generalized prioritized sweeping

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
A tutorial on learning with Bayesian networks

Learning in graphical models
Efficient Bayesian parameter estimation in large discrete domains

Proceedings of the 1998 conference on Advances in neural information processing systems II
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Using Learning for Approximation in Stochastic Processes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Stochastic simulation algorithms for dynamic probabilistic networks

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Control of exploitation-exploration meta-parameter in reinforcement learning

Neural Networks - Computational models of neuromodulation
Characterizing Markov Decision Processes

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Coordination in multiagent reinforcement learning: a Bayesian approach

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Bayesian Reinforcement Learning for Coalition Formation under Uncertainty

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Proceedings of the 24th international conference on Machine learning
Multi-task reinforcement learning: a hierarchical Bayesian approach

Proceedings of the 24th international conference on Machine learning
Active reinforcement learning

Proceedings of the 25th international conference on Machine learning
Sequential decision making with untrustworthy service providers

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 2
Sequential decision making in repeated coalition formation under uncertainty

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Imitation Learning Using Graphical Models

ECML '07 Proceedings of the 18th European conference on Machine Learning
Transferring Instances for Model-Based Reinforcement Learning

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
An online multi-agent co-operative learning algorithm in POMDPs

Journal of Experimental & Theoretical Artificial Intelligence
Spoken language interaction with model uncertainty: an adaptive human-robot interaction system

Connection Science - Language and Robots
A bayesian reinforcement learning approach for customizing human-robot interfaces

Proceedings of the 14th international conference on Intelligent user interfaces
Combining Cognitive with Computational Trust Reasoning

Trust in Agent Societies
Near-Bayesian exploration in polynomial time

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Dynamic information source selection for intrusion detection systems

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Action selection in Bayesian reinforcement learning

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Active imitation learning

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research
Using linear programming for Bayesian exploration in Markov decision processes

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A Bayesian approach to imitation in reinforcement learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
State abstraction discovery from irrelevant state variables

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty

Operations Research
Model-based reinforcement learning: a computational model and an fMRI study

Neurocomputing
Transfer Learning for Reinforcement Learning Domains: A Survey

The Journal of Machine Learning Research
Cognitive network management with reinforcement learning for wireless mesh networks

IPOM'07 Proceedings of the 7th IEEE international conference on IP operations and management
Simple model-based exploration and exploitation of Markov decision processes using the elimination algorithm

MICAI'07 Proceedings of the artificial intelligence 6th Mexican international conference on Advances in artificial intelligence
Uncertainty Propagation for Efficient Exploration in Reinforcement Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Self-organising sensors for wide area surveillance using the max-sum algorithm

SOAR'09 Proceedings of the First international conference on Self-organizing architectures
Smarter sampling in model-based Bayesian reinforcement learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Incorporating domain models into Bayesian optimization for RL

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
A minimum relative entropy principle for learning and acting

Journal of Artificial Intelligence Research
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Sequentially optimal repeated coalition formation under uncertainty

Autonomous Agents and Multi-Agent Systems
Active learning in partially observable markov decision processes

ECML'05 Proceedings of the 16th European conference on Machine Learning
Admission control policies for a multi-class QoS-aware service oriented architecture

ACM SIGMETRICS Performance Evaluation Review
Leveraging domain knowledge to learn normative behavior: a bayesian approach

ALA'11 Proceedings of the 11th international conference on Adaptive and Learning Agents
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Artificial Intelligence
Robust bayesian reinforcement learning through tight lower bounds

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Decentralized Bayesian reinforcement learning for online agent collaboration

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Optimal learning of transition probabilities in the two-agent newsvendor problem

Proceedings of the Winter Simulation Conference
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
You are what you consume: a bayesian method for personalized recommendations

Proceedings of the 7th ACM conference on Recommender systems
Lifelong learning for acquiring the wisdom of the crowd

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information - the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways to represent and reason about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation