Feature reinforcement learning in practice

Authors:
Phuong Nguyen;Peter Sunehag;Marcus Hutter
Affiliations:
Australian National University, Australia,NICTA, Australia;Australian National University, Australia;Australian National University, Australia,NICTA, Australia,ETHZ, Switzerland
Venue:
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Year:
2011

Citing 23
Cited 0

Elements of information theory

Elements of information theory
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Simulated Annealing: A Proof of Convergence

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the undecidability of probabilistic planning and related stochastic optimization problems

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Equivalence notions and model minimization in Markov decision processes

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Reinforcement learning with selective perception and hidden state

Reinforcement learning with selective perception and hidden state
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability

Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
Predictive state representations: a new theory for modeling dynamical systems

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)

Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)
Probabilistic Finite-State Machines-Part I

IEEE Transactions on Pattern Analysis and Machine Intelligence
Stochastic Optimization (Scientific Computation)

Stochastic Optimization (Scientific Computation)
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)

The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Monte Carlo Strategies in Scientific Computing

Monte Carlo Strategies in Scientific Computing
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Universal reinforcement learning

IEEE Transactions on Information Theory
Reinforcement learning with perceptual aliasing: the perceptual distinctions approach

AAAI'92 Proceedings of the tenth national conference on Artificial intelligence
Consistency of feature Markov processes

ALT'10 Proceedings of the 21st international conference on Algorithmic learning theory
A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
Bandit based monte-carlo planning

ECML'06 Proceedings of the 17th European conference on Machine Learning
A universal data compression system

IEEE Transactions on Information Theory
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Following a recent surge in using history-based methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called ΦMDP [13]. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the first empirical evaluation for ΦMDP. Our proposed algorithm achieves superior performance to the classical U-tree algorithm [20] and the recent active-LZ algorithm [6], and is competitive with MC-AIXI-CTW [29] that maintains a bayesian mixture over all context trees up to a chosen depth. We are encouraged by our ability to compete with this sophisticated method using an algorithm that simply picks one single model, and uses Q-learning on the corresponding MDP. Our ΦMDP algorithm is simpler and consumes less time and memory. These results show promise for our future work on attacking more complex and larger problems.