A minimum relative entropy principle for learning and acting

Authors:
Pedro A. Ortega;Daniel A. Braun
Affiliations:
Department of Engineering, University of Cambridge, Cambridge, UK;Department of Engineering, University of Cambridge, Cambridge, UK
Venue:
Journal of Artificial Intelligence Research
Year:
2010

Citing 24
Cited 2

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Reinforcement learning algorithms for average-payoff Markovian decision processes

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Average reward reinforcement learning: foundations, algorithms, and empirical results

Machine Learning - Special issue on reinforcement learning
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
A Bayesian approach to on-line learning

On-line learning in neural networks
Text compression as a test for artificial intelligence

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Causality: models, reasoning, and inference

Causality: models, reasoning, and inference
The Art of Causal Conjecture

The Art of Causal Conjecture
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Finite-time Analysis of the Multiarmed Bandit Problem

Machine Learning
Self-Optimizing and Pareto-Optimal Policies in General Environments Based on Bayes-Mixtures

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Optimality of universal Bayesian sequence prediction for general loss and alphabet

The Journal of Machine Learning Research
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability

Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
MOSAIC Model for Sensorimotor Learning and Control

Neural Computation
Prediction, Learning, and Games

Prediction, Learning, and Games
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)

The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Probabilistic Inference for Fast Learning in Control

Recent Advances in Reinforcement Learning
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Defensive universal learning with experts

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory

Information, utility and bounded rationality

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Reinforcement learning and the Bayesian control rule

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.