Universal reinforcement learning

Authors:
Vivek F. Farias;Ciamac C. Moallemi;Benjamin Van Roy;Tsachy Weissman
Affiliations:
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA;Graduate School of Business, Columbia University, New York, NY;Department of Management Science and Engineering and Department of Electrical Engineering, Stanford University, Stanford, CA;Department of Electrical Engineering, Stanford University, Stanford, CA
Venue:
IEEE Transactions on Information Theory
Year:
2010

Citing 10
Cited 2

Elements of information theory

Elements of information theory
Optimal prefetching via data compression

Journal of the ACM (JACM)
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Combining expert advice in reactive environments

Journal of the ACM (JACM)
Reinforcement learning in POMDPs without resets

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Universal prediction

IEEE Transactions on Information Theory
A universal predictor based on pattern matching

IEEE Transactions on Information Theory
On sequential strategies for loss functions with memory

IEEE Transactions on Information Theory
On the Structure of Optimal Real-Time Encoders and Decoders in Noisy Communication

IEEE Transactions on Information Theory
The context-tree weighting method: basic properties

IEEE Transactions on Information Theory

A Monte-Carlo AIXI approximation

Journal of Artificial Intelligence Research
Feature reinforcement learning in practice

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	754.84

Visualization

Abstract

We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence future observations and costs. The goal is to minimize the long-term average cost. We propose a novel algorithm, known as the active LZ algorithm, for optimal control based on ideas from the Lempel-Ziv scheme for universal data compression and prediction. We establish that, under the active LZ algorithm, if there exists an integer ?? such that the future is conditionally independent of the past given a window of ?? consecutive actions and observations, then the average cost converges to the optimum. Experimental results involving the game of Rock-Paper-Scissors illustrate merits of the algorithm.