A Monte-Carlo AIXI approximation

Authors:
Joel Veness;Kee Siong Ng;Marcus Hutter;William Uther;David Silver
Affiliations:
University of New South Wales and National ICT Australia;The Australian National University;The Australian National University and National ICT Australia;National ICT Australia and University of New South Wales;Massachusetts Institute of Technology
Venue:
Journal of Artificial Intelligence Research
Year:
2011

Citing 46
Cited 11

Elements of information theory

Elements of information theory
Technical Note: \cal Q-Learning

Machine Learning
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
The power of amnesia: learning probabilistic automata with variable memory length

Machine Learning - Special issue on COLT '94
Shifting Inductive Bias with Success-Story Algorithm, AdaptiveLevin Search, and Incremental Self-Improvement

Machine Learning - Special issue on inductive transfer
Discovering neural nets with low Kolmogorov complexity and high generalization capability

Neural Networks
A reinforcement learning algorithm in partially observable environments using short-term memory

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Inducing classification and regression trees in first order logic

Relational Data Mining
Rollout Algorithms for Stochastic Scheduling Problems

Journal of Heuristics
A Bayesian Approach to Model Learning in Non-Markovian Environments

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Self-Optimizing and Pareto-Optimal Policies in General Environments Based on Bayes-Mixtures

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Logic and Learning

Logic and Learning
Reinforcement learning with selective perception and hidden state

Reinforcement learning with selective perception and hidden state
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Using confidence bounds for exploitation-exploration trade-offs

The Journal of Machine Learning Research
Optimal Ordered Problem Solver

Machine Learning
Learning low dimensional predictive representations

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability

Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Predictive state representations: a new theory for modeling dynamical systems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Bayesian sparse sampling for on-line reward optimization

ICML '05 Proceedings of the 22nd international conference on Machine learning
Looping suffix tree-based inference of partially observable hidden state

ICML '06 Proceedings of the 23rd international conference on Machine learning
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
PAC model-free reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Combining online and offline knowledge in UCT

Proceedings of the 24th international conference on Machine learning
Learning Modal Theories

Inductive Logic Programming
Parallel Monte-Carlo Tree Search

CG '08 Proceedings of the 6th international conference on Computers and Games
An Introduction to Kolmogorov Complexity and Its Applications

An Introduction to Kolmogorov Complexity and Its Applications
Proto-predictive representation of states with simple recurrent temporal-difference networks

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Monte-Carlo simulation balancing

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A computational approximation to the AIXI model

Proceedings of the 2008 conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference
Effective short-term opponent exploitation in simplified poker

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Simulation-based approach to general game playing

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
On prediction using variable order Markov models

Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Top-down induction of first-order logical decision trees

Artificial Intelligence
Reinforcement Learning in Finite MDPs: PAC Analysis

The Journal of Machine Learning Research
Universal reinforcement learning

IEEE Transactions on Information Theory
Closing the learning-planning loop with predictive state representations

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
Defensive universal learning with experts

ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Context weighting for general finite-context sources

IEEE Transactions on Information Theory
The context-tree weighting method: extensions

IEEE Transactions on Information Theory
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Comparing humans and AI agents

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Compression and intelligence: social environments and communication

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Nested rollout policy adaptation for Monte Carlo tree search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Goal-Directed online learning of predictive models

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Feature reinforcement learning in practice

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
On ensemble techniques for AIXI approximation

AGI'12 Proceedings of the 5th international conference on Artificial General Intelligence
TEXPLORE: real-time sample-efficient reinforcement learning for robots

Machine Learning
A parameterized family of equilibrium profiles for three-player kuhn poker

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
On Potential Cognitive Abilities in the Machine Kingdom

Minds and Machines
Universal knowledge-seeking agents

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. Our approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a new Monte-Carlo Tree Search algorithm along with an agent-specific extension to the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a variety of stochastic and partially observable domains. We conclude by proposing a number of directions for future research.