Bounds on Sample Size for Policy Evaluation in Markov Environments

Authors:
Leonid Peshkin;Sayan Mukherjee
Affiliations:
-;-
Venue:
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Year:
2001

Citing 18
Cited 4

The complexity of Markov decision processes

Mathematics of Operations Research
Importance sampling for stochastic simulations

Management Science
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Decision theoretic generalizations of the PAC model for neural net and other learning applications

Information and Computation
Measuring the VC-dimension of a learning machine

Neural Computation
The nature of statistical learning theory

The nature of statistical learning theory
Reinforcement learning and mistake bounded algorithms

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Simulation and the Monte Carlo Method

Simulation and the Monte Carlo Method
Worst-Case Bounds for the Logarithmic Loss of Predictors

Machine Learning
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Learning Policies with External Memory

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning to Cooperate via Policy Search

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Adaptive Importance Sampling for Estimation in Structured Domains

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Model Selection and Error Estimation

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Dynamic Programming

Dynamic Programming
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Simulation-based optimization of Markov decision processes: An empirical process theory approach

Automatica (Journal of IFAC)
Reinforcement learning with partially known world dynamics

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Policy improvement for POMDPs using normalized importance sampling

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
2013 Special Issue: Autonomous reinforcement learning with experience replay

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning means finding the optimal course of action in Markovian environments without knowledge of the environment's dynamics. Stochastic optimization algorithms used in the field rely on estimates of the value of a policy. Typically, the value of a policy is estimated from results of simulating that very policy in the environment. This approach requires a large amount of simulation as different points in the policy space are considered. In this paper, we develop value estimators that utilize data gathered when using one policy to estimate the value of using another policy, resulting in much more data-efficient algorithms. We consider the question of accumulating a sufficient experience and give PAC-style bounds.