Eligibility Traces for Off-Policy Policy Evaluation

Authors:
Doina Precup;Richard S. Sutton;Satinder P. Singh
Affiliations:
-;-;-
Venue:
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Year:
2000

Citing 0
Cited 29

Bounds on Sample Size for Policy Evaluation in Markov Environments

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
Combining importance sampling and temporal difference control variates to simulate Markov Chains

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Reinforcement Learning with Approximation Spaces

Fundamenta Informaticae
Adaptive Importance Sampling Technique for Markov Chains Using Stochastic Approximation

Operations Research
Approximation spaces in off-policy Monte Carlo learning

Engineering Applications of Artificial Intelligence
Learning state-action basis functions for hierarchical MDPs

Proceedings of the 24th international conference on Machine learning
Reinforcement learning in the presence of rare events

Proceedings of the 25th international conference on Machine learning
Geodesic Gaussian kernels for value function approximation

Autonomous Robots
Efficient Sample Reuse in EM-Based Policy Search

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Adaptive importance sampling with automatic model selection in value function approximation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Building portable options: skill transfer in reinforcement learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Active policy iteration: efficient exploration through active learning for value function approximation in reinforcement learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Neural Networks
Least absolute policy iteration for robust value function approximation

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
A contextual-bandit approach to personalized news article recommendation

Proceedings of the 19th international conference on World wide web
Efficient exploration through active learning for value function approximation in reinforcement learning

Neural Networks
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

Proceedings of the fourth ACM international conference on Web search and data mining
Kalman temporal differences

Journal of Artificial Intelligence Research
Reinforcement learning with partially known world dynamics

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Policy improvement for POMDPs using normalized importance sampling

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Neural Computation
Recursive least-squares learning with eligibility traces

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Transfer in reinforcement learning via shared features

The Journal of Machine Learning Research
Reinforcement Learning with Approximation Spaces

Fundamenta Informaticae
Estimating interleaved comparison outcomes from historical click data

Proceedings of the 21st ACM international conference on Information and knowledge management
Reusing historical interaction data for faster online learning to rank for IR

Proceedings of the sixth ACM international conference on Web search and data mining
Efficient sample reuse in policy gradients with parameter-based exploration

Neural Computation
Learning exploration strategies in model-based reinforcement learning

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract