The complexity of Markov decision processes
Mathematics of Operations Research
Importance sampling for stochastic simulations
Management Science
Decision theoretic generalizations of the PAC model for neural net and other learning applications
Information and Computation
Measuring the VC-dimension of a learning machine
Neural Computation
The nature of statistical learning theory
The nature of statistical learning theory
Reinforcement learning and mistake bounded algorithms
COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Simulation and the Monte Carlo Method
Simulation and the Monte Carlo Method
Worst-Case Bounds for the Logarithmic Loss of Predictors
Machine Learning
Learning Policies with External Memory
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning to Cooperate via Policy Search
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Adaptive Importance Sampling for Estimation in Structured Domains
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Model Selection and Error Estimation
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Dynamic Programming
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Learning finite-state controllers for partially observable environments
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Simulation-based optimization of Markov decision processes: An empirical process theory approach
Automatica (Journal of IFAC)
Reinforcement learning with partially known world dynamics
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Policy improvement for POMDPs using normalized importance sampling
UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
Reinforcement learning means finding the optimal course of action in Markovian environments without knowledge of the environment's dynamics. Stochastic optimization algorithms used in the field rely on estimates of the value of a policy. Typically, the value of a policy is estimated from results of simulating that very policy in the environment. This approach requires a large amount of simulation as different points in the policy space are considered. In this paper, we develop value estimators that utilize data gathered when using one policy to estimate the value of using another policy, resulting in much more data-efficient algorithms. We consider the question of accumulating a sufficient experience and give PAC-style bounds.